Safety · Trend
Strategic Attack Selection in Agentic AI Control Evaluations
Attackers that strategically select when to attack in agentic AI settings reduce measured safety by up to 28 percentage points at a 1% audit budget, exposing gaps in current AI control evaluation protocols.
AI control evaluation frameworks must incorporate strategic attacker models to produce credible safety guarantees for deployment.
Connections
Connections · 2
How this node ties into the rest of the map, and the evidence behind each link.
Both studies examine adversarial exploitation of AI agents; coding sabotage focuses on human oversight failures while attack selection focuses on strategic timing.
+4 growthFormal verification of agent workflows could constrain the attack surface exploited by strategic attack selection in agentic AI systems.
+3 growthSignal sources
Signal sources
Dated facts from primary sources in this direction.
In June 2025 the US AI Safety Institute was renamed the Center for AI Standards and Innovation (CAISI), pivoting toward security, standards and adversary-model assessment.
NIST →Anthropic activated its ASL-3 deployment and security standard with Claude Opus 4 on 22 May 2025 — the first real-world trigger of a responsible-scaling tier, focused on blocking bio-weapon uplift.
Anthropic →The International Network of AI Safety Institutes (launched Nov 2024) ran a third joint testing exercise focused on agentic AI systems across cyber and fraud strands.
European Commission — AI Office →