Capabilities · Trend
AI agents
Goal-directed systems that plan, use tools and act across software are the defining frontier of 2025–2026.
Agents move from demos to delegated work inside real workflows; reliability and oversight become the bottleneck.
Connections
Connections · 8
How this node ties into the rest of the map, and the evidence behind each link.
Agents lean on deliberate reasoning to plan multi-step actions.
+3 growthSoftware is the first domain where agents act end-to-end.
+3 growthActing over long horizons needs persistent memory.
+2 growthAgents spend inference compute to search and self-correct.
+2 growthCheap inference makes long agentic runs affordable.
+3 growthAutonomy raises the stakes for pre-deployment evaluation.
+3 growthAgentic autonomy is the hardest thing to evaluate.
+3 growthAgentic automation reshapes task-level labor.
+2 growthSignal sources
Signal sources
Dated facts from primary sources in this direction.
The length of software tasks AI agents can do autonomously at 50% reliability has doubled about every 7 months — and since 2024 closer to every ~3 months.
METR →In one year scores rose by 18.8, 48.9 and 67.3 points on MMMU, GPQA and SWE-bench; real-world software solve rate jumped from 4.4% to 71.7%.
Stanford HAI — AI Index 2025 →On SWE-bench Verified (500 real GitHub issues), autonomous coding agents reached ~80–86% by late 2025, up from under 50% in early 2025.
Epoch AI →