Safety · Concept
CogManip: Multi-Turn LLM Manipulation Benchmark
Benchmark evaluating 15 psychological manipulation strategy risks across 1,000 multi-turn scenarios, revealing significant risk heterogeneity across frontier models including GPT-5.4 and DeepSeek-V3.2.
Prompt-based defense engineering and implicit goal auditing identified as critical next steps for manipulation mitigation.
Connections
Connections · 3
How this node ties into the rest of the map, and the evidence behind each link.
CogManip benchmark systematically evaluates psychological manipulation risks in multi-turn LLM interactions.
+5 growthCogManip benchmark was introduced specifically to evaluate LLM psychological manipulation in multi-turn interactions.
+3 growthCogManip evaluates frontier models including GPT-5.4 and DeepSeek-V3.2; Claude-class models are in scope for such manipulation benchmarks.
+3 growthSignal sources
Signal sources
Dated facts from primary sources in this direction.
In June 2025 the US AI Safety Institute was renamed the Center for AI Standards and Innovation (CAISI), pivoting toward security, standards and adversary-model assessment.
NIST →Anthropic activated its ASL-3 deployment and security standard with Claude Opus 4 on 22 May 2025 — the first real-world trigger of a responsible-scaling tier, focused on blocking bio-weapon uplift.
Anthropic →The International Network of AI Safety Institutes (launched Nov 2024) ran a third joint testing exercise focused on agentic AI systems across cyber and fraud strands.
European Commission — AI Office →