Where This Is Going: An Honest Look at AI Forecasts

Chapter 15 of 18 Practitioner · 9 min

The previous chapter ended with a deliberate omission: it told you what AI costs, and said nothing about where the capability curve goes next. This article answers that – the only honest way I know how. Not with my predictions. With the work of people who publish theirs in advance, attach probabilities, and then grade themselves in public when reality arrives.

// the crux

You cannot predict AI's timeline – but you can read forecasts without being played. Trust the people who publish predictions in advance, attach probabilities, and grade themselves in public. Track one metric (METR's doubling), and make the moves that hold whether the median lands in 2030 or never.

// in one breath

How to read an AI forecast without being played – a median is not a promise, a modal year is not a median, and grading yourself in public is a strength, not an embarrassment.
What the three most credible signals actually say – METR's doubling time-horizon, a forecasting project honest enough to admit reality ran at ~65% of its predicted pace, and the labs' own operational numbers.
The practical posture: five no-regret moves that hold whether the median lands in 2030, 2035, or never.

the week it became personal

// June 2026 – a data point from my own desk

Fable 5 was released on June 9. I have used it for two days. For the first time in my career, I feel that something is more intelligent than I am in my own domain – software development and cloud architecture. Not faster at typing. Not better at recall. More intelligent, in the part of the work I considered mine: the architecture call, the trade-off, the failure mode I would have caught.

I am not alone in this reading. Stripe described it compressing "months of engineering into days" on a codebase migration. GitHub called the agentic coding results "the strongest of any Claude model we've had the opportunity to test." I cite those not as marketing – but because they match what I watched happen on my own screen, in my own codebase, this week.

That moment changes how you read forecasts. They stop being entertainment. So the question becomes urgent and practical: whose forecasts deserve your attention, and what do the credible ones actually say?

forecast literacy

How to Read a Forecast Without Being Misled

Most AI timeline discourse fails before it starts, because most readers – and most journalists – read forecasts wrong. Three distinctions do most of the work:

A median is not a promise. When a forecaster says "median 2030," they are saying: half my probability mass is before this date, half after. It is a statement about a distribution, not an appointment. The AI Futures Project – the team behind the AI-2027 scenario – wrote an entire clarification post correcting coverage that read their scenario as a confident claim that AGI arrives in 2027. Their words: they "certainly cannot confidently predict a specific year."

Updating in public is the credential. The strongest signal that a forecaster deserves attention is not that they were right – it is that they show you their corrections. Watch the medians move as evidence arrived:

// Published timeline updates – AI Futures Project (median for transformative AI)

Daniel Kokotajlo

2027 (held Dec 2022 – Jan 2025) → 2028 (Feb 2025) → ~2030 (Nov 2025) → Dec 2030 (Jan 2026)

Eli Lifland

2060 (2021) → Jan 2035 (Jan 2026)

Survey median

2030 (413 forecasters · "AIs better than humans at every cognitive task")

Read that strip carefully, because it contains the whole epistemic lesson. One forecaster moved his median later by three years as evidence came in slower than his scenario. Another moved his earlier by twenty-five years. They publish the moves, with reasons. Compare that to the people in your feed who have been alternately announcing AGI-next-year and permanent-AI-winter without ever scoring a past claim.

the self-graders

The Forecasters Who Grade Their Own Work

In late 2025, the AI Futures Project did the thing almost nobody in this field does: they went back through the AI-2027 scenario's checkable 2025 predictions and graded themselves in public. The headline result: reality ran at roughly 65% of their predicted pace on quantitative metrics. Directionally right, consistently too fast.

Prediction (for 2025)	What happened	Verdict
SWE-bench ~85% by mid-2025	Best actual score: 74.5%	Slower than predicted
OpenAI revenue ~$18B annualised	~$20B annualised	Slightly ahead
Frontier lead of 3–9 months	Top US labs separated by 0–2 months	Closer race than modelled
Agents adopted into workflows	Happened – adoption real but narrower than scenario	Directionally right
AI R&D self-acceleration uplift	Materialised slower than modelled	Slower than predicted

Their method, in their own words: "Make a detailed, concrete trajectory… wait a while… check if things are roughly on track… adjust your guess about how the future will go, to be correspondingly faster or slower." That is not prophecy. That is engineering discipline applied to the future – and it is why their separate forecaster survey is worth knowing too: the 2025 aggregate of 413 forecasters was "about right on benchmarks," underestimated revenue, and overestimated public salience. Capability and money moved faster than attention. Keep that asymmetry in mind; it describes 2026 as well.

⧗ Forecast scenario – not a measurement

The AI-2027 scenario itself – the team's detailed, quarter-by-quarter narrative – sketches where the curve points if it does not bend. Its early-2027 frame:

100,000

Reliable agent copies running in parallel

17×

Human thinking speed, per copy

~65%

Of predicted pace – how fast reality actually ran in 2025

Hold both numbers at once: the scenario's scale, and the measured 65% pace correction. A vivid future, arriving slower than written – that is the honest summary of the best scenario work in the field.

// source · ai-2027.com + blog.aifutures.org – AI Futures Project

the measurement

METR: The One Metric Worth Tracking Quarterly

If you track a single capability number, make it METR's time horizon: the length of task – measured in expert-human completion time – that an AI agent can finish autonomously at a given reliability. A 50% horizon of one hour means the model succeeds half the time at tasks that take a skilled human an hour. It is the closest thing the field has to a speedometer, because it measures the thing that actually matters for work: how long can you leave it alone.

Three facts about it. First, METR's 2025 research found the 50% horizon doubling roughly every seven months across six years of models – with the recent subset trending faster. Second, exponential curves fit this data better than linear or flattening ones; that is a measured result, not an assumption. Third – and this is the detail I find most telling – METR notes that measurements above 16 hours are currently unreliable, because their task suite was not built for tasks that long. The frontier models of mid-2026, Fable 5 among them, are outgrowing the ruler. The constraint on measuring AI capability is now the cost of hiring humans who can do week-long tasks for comparison.

If the doubling holds – an "if" that the 65% grading result tells you to treat with respect – tasks measured in days fall within the planning horizon of anyone reading this, and tasks measured in weeks follow on a schedule you can roughly compute yourself. That is the entire forecast, stated plainly. No date. A slope, and an honest error bar.

the labs' own numbers

What the Builders Report from Inside

The third signal source is the labs themselves. They have an obvious incentive to talk the curve upward – so I weight their operational numbers, the ones describing their own engineering reality, above their predictions. Anthropic's published material on recursive self-improvement reports, as of May 2026:

80%+

Of Anthropic's production code authored by Claude

76%

Success on open-ended engineering tasks – up from 26% six months earlier

52×

Code-optimisation speedup vs ~3× in May 2025

8×

More code shipped per engineer per quarter vs 2021–2025

The same page sketches three futures: trends stall (which they rate least likely), efficiency keeps compounding under human direction (their most probable near-term), or AI systems begin building their successors. On the third, their language is deliberately uncomfortable: it "is not inevitable" but "could come sooner than most institutions are prepared for." Dario Amodei's January 2026 essay The Adolescence of Technology puts his own range on powerful AI – Nobel-level capability across fields, millions of parallel instances – at one to two years, while acknowledging the uncertainty, and pairs it with the prediction practitioners should sit with longest: serious displacement pressure on entry-level white-collar work within one to five years.

Discount all of this for incentive if you like – I do. But note what happens when you put the three sources side by side. The independent measurement (METR), the self-graded scenario team (AI Futures), and the most safety-vocal lab all describe the same shape: steep, compounding, and slower than the most vivid scenarios – but not by much, and not slowing to a stop.

honest uncertainty

Where the Credible Views Agree – and Where I'm Not Sure

The agreement zone is narrower than the headlines and wider than the scepticism. Every credible source I track expects continued rapid capability growth through at least 2027–2028, expects software engineering to stay the leading edge, and expects autonomous task length to keep extending. The disagreement is about timing and ceiling: medians for "better than humans at every cognitive task" run from 2030 (survey median, Kokotajlo) to 2035 (Lifland) – and a serious tail of probability extends well past both.

// What I am genuinely unsure about

Whether the doubling trend bends at long horizons. Week-long tasks involve context, judgment, and recovery-from-ambiguity that 16-hour tasks do not. The data cannot yet say; neither can I.
Whether my Fable 5 moment generalises. "More intelligent than me in my domain" is one practitioner's reading after two days. It is a data point, not a measurement – I labelled it accordingly.
Whether the chips-and-power constraints from Chapter 14 act as a brake the forecasts underweight. Capability curves assume the compute arrives. The HBM supply data says that assumption is at least worth a question mark.
Whether economic diffusion keeps lagging capability. The 2025 forecasters overestimated public salience – society noticing less than expected is itself a forecast-relevant fact.

go look yourself

The Two Sources I Would Send Anyone To

Do not take the summary above on faith – that would defeat the entire point of this article. The two most credible starting points in AI forecasting are free, public, and written to be checked:

ai-2027.com

AI 2027 – the scenario

A research-grade, quarter-by-quarter narrative of the AI race, written by former OpenAI and AI-policy researchers. Read it as a detailed hypothesis with published error bars – the team grades it against reality as the dates arrive.

Read the scenario →

blog.aifutures.org

AI Futures Project – the working notes

The self-gradings, the timeline clarifications, the annual forecaster surveys (413 ranked participants), and the updates as medians move. The most honest forecasting paper trail in the field.

Read the forecasts →

the posture

What to Actually Do With a Forecast

// Five no-regret moves – sensible at any median

Track the slope, not the headlines. Check METR's time horizons quarterly. One number, one trend line – it will tell you more than a year of launch-day coverage.
Plan in ranges, not years. The credible medians span 2030–2035 with wide error bars. Any career or architecture decision that only works under one of those dates is a bet, not a plan.
Keep filling the compounding column. Chapter 1's distinction holds under every scenario: fundamentals, judgment, and domain depth appreciate precisely because execution is getting cheap.
Design processes for all three lab scenarios. The practices in Chapters 6–13 – bounded contexts, review gates, evals, cost ceilings – are exactly the infrastructure that pays off whether trends stall, compound, or close the loop.
Take the displacement forecast personally – and early. If entry-level cognitive work comes under pressure within one to five years, the move up the judgment ladder starts now, not at the median date.

// I believe this

A forecast is not a promise about the future. It is a discipline for being less wrong about it. The people worth reading publish their predictions, show their corrections, and put error bars on their own conviction – everyone else is doing marketing with dates.

This chapter covered where the capability curve points – the slope, the error bars, the honest uncertainty. That closes the bill-and-horizon stretch of the book.

// carry forward

From here the series widens its lens. The final three chapters step back from building to the bigger picture: the civilisational risks that pass through the systems you ship, the security discipline they sharpen into, and the signal map for staying oriented while it all keeps moving. It begins with the risks – Chapter 16.

// forecast references last reviewed · 2026