Capabilities · Trend
Autonomous coding
Coding agents close issues and open pull requests end-to-end; software is the first mass agentic domain.
Engineering reorganizes around review of machine-written code; provenance and tests gate the loop.
Connections
Connections · 1
How this node ties into the rest of the map, and the evidence behind each link.
Signal sources
Signal sources
Dated facts from primary sources in this direction.
The length of software tasks AI agents can do autonomously at 50% reliability has doubled about every 7 months — and since 2024 closer to every ~3 months.
METR →In one year scores rose by 18.8, 48.9 and 67.3 points on MMMU, GPQA and SWE-bench; real-world software solve rate jumped from 4.4% to 71.7%.
Stanford HAI — AI Index 2025 →On SWE-bench Verified (500 real GitHub issues), autonomous coding agents reached ~80–86% by late 2025, up from under 50% in early 2025.
Epoch AI →