Where This Is Going:
An Honest Look at AI Forecasts

The previous article ended with a deliberate omission: it told you what AI costs, and said nothing about where the capability curve goes next. This article answers that – the only honest way I know how. Not with my predictions. With the work of people who publish theirs in advance, attach probabilities, and then grade themselves in public when reality arrives.

// TL;DR – what you'll take away
  • How to read an AI forecast without being misled: medians are not promises, modal years are not medians, and public self-correction is a feature – not an embarrassment.
  • What the three most credible signals say: METR's doubling time-horizon metric, the AI Futures Project's self-graded scenario work (reality ran at ~65% of their predicted pace), and the labs' own operational numbers.
  • The practical posture: five no-regret moves that make sense whether the median lands in 2030, 2035, or never.
the week it became personal
// June 2026 – a data point from my own desk

Fable 5 was released on June 9. I have used it for two days. For the first time in my career, I feel that something is more intelligent than I am in my own domain – software development and cloud architecture. Not faster at typing. Not better at recall. More intelligent, in the part of the work I considered mine: the architecture call, the trade-off, the failure mode I would have caught.

I am not alone in this reading. Stripe described it compressing "months of engineering into days" on a codebase migration. GitHub called the agentic coding results "the strongest of any Claude model we've had the opportunity to test." I cite those not as marketing – but because they match what I watched happen on my own screen, in my own codebase, this week.

That moment changes how you read forecasts. They stop being entertainment. So the question becomes urgent and practical: whose forecasts deserve your attention, and what do the credible ones actually say?

forecast literacy

How to Read a Forecast Without Being Misled

Most AI timeline discourse fails before it starts, because most readers – and most journalists – read forecasts wrong. Three distinctions do most of the work:

A median is not a promise. When a forecaster says "median 2030," they are saying: half my probability mass is before this date, half after. It is a statement about a distribution, not an appointment. The AI Futures Project – the team behind the AI-2027 scenario – wrote an entire clarification post correcting coverage that read their scenario as a confident claim that AGI arrives in 2027. Their words: they "certainly cannot confidently predict a specific year."

Updating in public is the credential. The strongest signal that a forecaster deserves attention is not that they were right – it is that they show you their corrections. Watch the medians move as evidence arrived:

// Published timeline updates – AI Futures Project (median for transformative AI)
Daniel Kokotajlo
2027 (held Dec 2022 – Jan 2025) → 2028 (Feb 2025) → ~2030 (Nov 2025) → Dec 2030 (Jan 2026)
Eli Lifland
2060 (2021) → Jan 2035 (Jan 2026)
Survey median
2030 (413 forecasters · "AIs better than humans at every cognitive task")

Read that strip carefully, because it contains the whole epistemic lesson. One forecaster moved his median later by three years as evidence came in slower than his scenario. Another moved his earlier by twenty-five years. They publish the moves, with reasons. Compare that to the people in your feed who have been alternately announcing AGI-next-year and permanent-AI-winter without ever scoring a past claim.

the self-graders

The Forecasters Who Grade Their Own Work

In late 2025, the AI Futures Project did the thing almost nobody in this field does: they went back through the AI-2027 scenario's checkable 2025 predictions and graded themselves in public. The headline result: reality ran at roughly 65% of their predicted pace on quantitative metrics. Directionally right, consistently too fast.

Prediction (for 2025) What happened Verdict
SWE-bench ~85% by mid-2025 Best actual score: 74.5% Slower than predicted
OpenAI revenue ~$18B annualised ~$20B annualised Slightly ahead
Frontier lead of 3–9 months Top US labs separated by 0–2 months Closer race than modelled
Agents adopted into workflows Happened – adoption real but narrower than scenario Directionally right
AI R&D self-acceleration uplift Materialised slower than modelled Slower than predicted

Their method, in their own words: "Make a detailed, concrete trajectory… wait a while… check if things are roughly on track… adjust your guess about how the future will go, to be correspondingly faster or slower." That is not prophecy. That is engineering discipline applied to the future – and it is why their separate forecaster survey is worth knowing too: the 2025 aggregate of 413 forecasters was "about right on benchmarks," underestimated revenue, and overestimated public salience. Capability and money moved faster than attention. Keep that asymmetry in mind; it describes 2026 as well.

⧗ Forecast scenario – not a measurement

The AI-2027 scenario itself – the team's detailed, quarter-by-quarter narrative – sketches where the curve points if it does not bend. Its early-2027 frame:

100,000
Reliable agent copies running in parallel
17×
Human thinking speed, per copy
~65%
Of predicted pace – how fast reality actually ran in 2025

Hold both numbers at once: the scenario's scale, and the measured 65% pace correction. A vivid future, arriving slower than written – that is the honest summary of the best scenario work in the field.

// source · ai-2027.com + blog.aifutures.org – AI Futures Project
the measurement

METR: The One Metric Worth Tracking Quarterly

If you track a single capability number, make it METR's time horizon: the length of task – measured in expert-human completion time – that an AI agent can finish autonomously at a given reliability. A 50% horizon of one hour means the model succeeds half the time at tasks that take a skilled human an hour. It is the closest thing the field has to a speedometer, because it measures the thing that actually matters for work: how long can you leave it alone.

Three facts about it. First, METR's 2025 research found the 50% horizon doubling roughly every seven months across six years of models – with the recent subset trending faster. Second, exponential curves fit this data better than linear or flattening ones; that is a measured result, not an assumption. Third – and this is the detail I find most telling – METR notes that measurements above 16 hours are currently unreliable, because their task suite was not built for tasks that long. The frontier models of mid-2026, Fable 5 among them, are outgrowing the ruler. The constraint on measuring AI capability is now the cost of hiring humans who can do week-long tasks for comparison.

If the doubling holds – an "if" that the 65% grading result tells you to treat with respect – tasks measured in days fall within the planning horizon of anyone reading this, and tasks measured in weeks follow on a schedule you can roughly compute yourself. That is the entire forecast, stated plainly. No date. A slope, and an honest error bar.

the labs' own numbers

What the Builders Report from Inside

The third signal source is the labs themselves. They have an obvious incentive to talk the curve upward – so I weight their operational numbers, the ones describing their own engineering reality, above their predictions. Anthropic's published material on recursive self-improvement reports, as of May 2026:

80%+
Of Anthropic's production code authored by Claude
76%
Success on open-ended engineering tasks – up from 26% six months earlier
52×
Code-optimisation speedup vs ~3× in May 2025
More code shipped per engineer per quarter vs 2021–2025

The same page sketches three futures: trends stall (which they rate least likely), efficiency keeps compounding under human direction (their most probable near-term), or AI systems begin building their successors. On the third, their language is deliberately uncomfortable: it "is not inevitable" but "could come sooner than most institutions are prepared for." Dario Amodei's January 2026 essay The Adolescence of Technology puts his own range on powerful AI – Nobel-level capability across fields, millions of parallel instances – at one to two years, while acknowledging the uncertainty, and pairs it with the prediction practitioners should sit with longest: serious displacement pressure on entry-level white-collar work within one to five years.

Discount all of this for incentive if you like – I do. But note what happens when you put the three sources side by side. The independent measurement (METR), the self-graded scenario team (AI Futures), and the most safety-vocal lab all describe the same shape: steep, compounding, and slower than the most vivid scenarios – but not by much, and not slowing to a stop.

honest uncertainty

Where the Credible Views Agree – and Where I'm Not Sure

The agreement zone is narrower than the headlines and wider than the scepticism. Every credible source I track expects continued rapid capability growth through at least 2027–2028, expects software engineering to stay the leading edge, and expects autonomous task length to keep extending. The disagreement is about timing and ceiling: medians for "better than humans at every cognitive task" run from 2030 (survey median, Kokotajlo) to 2035 (Lifland) – and a serious tail of probability extends well past both.

// What I am genuinely unsure about
  • Whether the doubling trend bends at long horizons. Week-long tasks involve context, judgment, and recovery-from-ambiguity that 16-hour tasks do not. The data cannot yet say; neither can I.
  • Whether my Fable 5 moment generalises. "More intelligent than me in my domain" is one practitioner's reading after two days. It is a data point, not a measurement – I labelled it accordingly.
  • Whether the chips-and-power constraints from Article 03 act as a brake the forecasts underweight. Capability curves assume the compute arrives. The HBM supply data says that assumption is at least worth a question mark.
  • Whether economic diffusion keeps lagging capability. The 2025 forecasters overestimated public salience – society noticing less than expected is itself a forecast-relevant fact.
go look yourself

The Two Sources I Would Send Anyone To

Do not take the summary above on faith – that would defeat the entire point of this article. The two most credible starting points in AI forecasting are free, public, and written to be checked:

the posture

What to Actually Do With a Forecast

// Five no-regret moves – sensible at any median
  1. Track the slope, not the headlines. Check METR's time horizons quarterly. One number, one trend line – it will tell you more than a year of launch-day coverage.
  2. Plan in ranges, not years. The credible medians span 2030–2035 with wide error bars. Any career or architecture decision that only works under one of those dates is a bet, not a plan.
  3. Keep filling the compounding column. Article 01's distinction holds under every scenario: fundamentals, judgment, and domain depth appreciate precisely because execution is getting cheap.
  4. Design processes for all three lab scenarios. The practices in Articles 06–10 – bounded contexts, review gates, evals, cost ceilings – are exactly the infrastructure that pays off whether trends stall, compound, or close the loop.
  5. Take the displacement forecast personally – and early. If entry-level cognitive work comes under pressure within one to five years, the move up the judgment ladder starts now, not at the median date.
// I believe this

A forecast is not a promise about the future. It is a discipline for being less wrong about it. The people worth reading publish their predictions, show their corrections, and put error bars on their own conviction – everyone else is doing marketing with dates.

This article covered where the capability curve points. The next one walks the same curve outward across the world it touches: coding, hacking, robotics, politics, biology, and forecasting itself – six fields, each sorted honestly into what currently exists, what is emerging, and what is still science fiction. That is Article 05, and it completes The Futures arc before the series returns to the engine room.
// forecast references last reviewed · 2026