← Back to the trend map

Capabilities · Trend

Reasoning models

Models trained to deliberate before answering turned test-time thinking into a new scaling axis.

Trend strength 9/10

Momentum +2/q

Confidence high

Status rising

Forecast horizon

Deliberate reasoning becomes default; the question shifts to cost, faithfulness and verifiable chains.

Connections

Connections · 5

How this node ties into the rest of the map, and the evidence behind each link.

from · requires 8/10

Agents lean on deliberate reasoning to plan multi-step actions.

to · uses 8/10

Test-time compute

Reasoning models convert inference compute into accuracy.

from · enables 7/10

Compute scaling

More compute underwrites the reasoning paradigm.

to · enables 6/10

AI for science

Stronger reasoning unlocks math and scientific problem-solving.

from · enables 6/10

Synthetic data

Verifier-generated data trains better reasoning.

Signal sources

Signal sources

Dated facts from primary sources in this direction.

Task horizon doubling Mar 2025

The length of software tasks AI agents can do autonomously at 50% reliability has doubled about every 7 months — and since 2024 closer to every ~3 months.

Benchmarks saturating Apr 2025

In one year scores rose by 18.8, 48.9 and 67.3 points on MMMU, GPQA and SWE-bench; real-world software solve rate jumped from 4.4% to 71.7%.

Stanford HAI — AI Index 2025 →

Autonomous coding 2025–2026

On SWE-bench Verified (500 real GitHub issues), autonomous coding agents reached ~80–86% by late 2025, up from under 50% in early 2025.