← Back to the trend map

Safety · Trend

Strategic Attack Selection in Agentic AI Control Evaluations

Attackers that strategically select when to attack in agentic AI settings reduce measured safety by up to 28 percentage points at a 1% audit budget, exposing gaps in current AI control evaluation protocols.

Trend strength 3/10
Momentum +3/q
Confidence low
Status new
Forecast horizon

AI control evaluation frameworks must incorporate strategic attacker models to produce credible safety guarantees for deployment.

Connections

Connections · 2

How this node ties into the rest of the map, and the evidence behind each link.

Signal sources

Signal sources

Dated facts from primary sources in this direction.