Agents & Automation

Claude now writes more than 80% of Anthropic's code — and the company warns recursive self-improvement may be closer than anyone expected

Anthropic reveals Claude authored 80%+ of its merged codebase by May 2026 and calls for international coordination before AI can fully design its own successors.

agents automation category

Anthropic has published a detailed account of how much of its own software Claude now writes, and the numbers are striking enough that the company felt compelled to attach a warning to them.

As of May 2026, more than 80% of the code merged into Anthropic’s production codebase was authored by Claude. In early 2024, before Claude Code existed in any form, that figure was in the low single digits. The ratio has essentially inverted in about fifteen months.

The post, published by the newly formed Anthropic Institute and co-authored by co-founder Jack Clark and Institute Lead Marina Favaro, does not frame this purely as a productivity win. It frames it as a signal that something more consequential may be approaching: a point where AI systems can meaningfully participate in designing their own successors, with or without meaningful human involvement.

What the numbers actually show

The productivity figures are worth sitting with for a moment, because they move fast.

In Q2 2026, the typical Anthropic engineer was merging eight times as much code per day as they were in 2024. That is not eight times as many keystrokes. It is eight times as much shipped, reviewed, and merged work, because Claude is doing the drafting and the engineer is directing, reviewing, and approving.

On a specific code optimisation benchmark, Claude Opus 4 achieved roughly a 3x speedup over baseline code in May 2025. By April 2026, Claude Mythos Preview was hitting 52x on the same task. For context, a skilled human researcher doing this work would typically reach around 4x after four to eight hours of effort.

In April 2026, Claude shipped over 800 fixes that reduced a class of API errors by a factor of one thousand. The engineer overseeing the work estimated a human would have needed four years to complete it. The bottleneck is not just speed, it is the ability to hold enormous amounts of unfamiliar context simultaneously, something humans find genuinely hard.

A multi-agent experiment ran nine parallel Claude instances on a research problem for over 800 cumulative hours at a compute cost of around $18,000. They recovered 97% of the performance gap on the task. Two human researchers working for a week recovered 23%.

The two inflection points

Anthropic identifies two clear moments where the trajectory changed.

The first came in 2025, when Claude moved from suggesting code for an engineer to copy-paste to actually running code itself. Lines merged per engineer per day, which had been flat across Anthropic’s first four years, began climbing.

The second came in 2026, when models started completing longer autonomous tasks without needing to check in at every step. The slope steepened again.

The task length that Claude can handle reliably on its own was previously doubling every seven months. Anthropic now says that rate has accelerated to roughly every four months. If that holds, the company projects that by 2027, AI systems could reliably handle work that currently takes a person several weeks.

What “recursive self-improvement” actually means, and why it matters

The term sounds abstract, but the concern is concrete. If an AI system is capable enough to contribute meaningfully to AI research and development, and if AI is already doing most of the coding at a frontier AI lab, then the gap between “AI assists development” and “AI drives development” starts to close.

Anthropic lays out three possible futures. In the first, progress plateaus and today’s capabilities reshape the economy without accelerating further. In the second, AI development becomes substantially automated while humans retain strategic direction. In the third, AI systems begin autonomously designing their own successors with little meaningful human involvement.

Anthropic says it does not have good intuitions for what the third scenario looks like. That admission, from the company building the systems in question, is the part of this post worth paying attention to.

The specific safety concern is compounding misalignment. Rare alignment failures in today’s models are manageable. But if a model with subtle misalignment helps build the next model, and that model helps build the one after it, errors could accumulate and amplify faster than human oversight can track them.

The coordination problem

Anthropic is not calling for a unilateral pause. It is explicit about why that would not work: if one lab slows down, competitors keep building, and the cautious lab loses market position without making the world any safer. The problem is structural, not a matter of individual willpower.

What the company is calling for is an internationally verifiable mechanism that would allow multiple frontier labs to halt development simultaneously, under conditions that can be independently confirmed. The comparison it draws is to nuclear verification regimes, with the uncomfortable caveat that those took decades to establish, and Anthropic does not expect the AI industry to have that kind of time.

The Institute plans to convene policymakers, researchers, civil society organisations, and competing AI firms in the coming months to work through what such a system would require.

One piece of context that has drawn scrutiny: this post arrived less than a week after Anthropic confidentially filed for an IPO and closed a funding round valuing the company at close to $965 billion. Some commentators have questioned whether a company heading for a near-trillion-dollar listing is well-positioned to urge restraint. Anthropic has not addressed that tension directly.

What this means for you

If you work in software, the practical reality is already visible: Claude-written code is at rough parity with human-written code at Anthropic today, and the company expects it to be strictly better within the year. The engineer’s role has shifted from author to reviewer and decision-maker. That shift is not hypothetical or future-tense at frontier AI labs, it is the current state.

If you are in a leadership role thinking about AI adoption, the Amdahl’s Law point in Anthropic’s post is worth noting. As AI generates more output, the bottleneck moves to human review capacity. Shipping faster only helps if the review and decision-making infrastructure can keep up. Anthropic has already hit this friction internally, and it is not a small lab.

If you are in policy or government, the four-month doubling rate on autonomous task length is the number to watch. That is the metric that determines how quickly the governance window narrows.

And if you are simply trying to understand where this is heading: a company that is commercially motivated to accelerate AI development is publicly saying it is concerned about where that acceleration leads, and asking for external constraints on itself and its competitors. That is an unusual thing for any company to do, and it is probably worth taking seriously on its own terms.

The full post is available at anthropic.com/institute/recursive-self-improvement.