Anthropic tracked 832 malicious accounts for a year. The MITRE ATT&CK framework can't fully describe what it found.

Anthropic’s Frontier Red Team spent a year watching how real threat actors use AI to conduct cyberattacks. Not in a lab. Not in a simulation. By analysing 832 accounts that were banned from Claude for malicious activity between March 2025 and March 2026.

The resulting report, published June 3, 2026, documents 13,873 individual malicious actions mapped across 482 unique techniques and all 14 tactics in the MITRE ATT&CK framework. It is one of the most granular public datasets on AI-enabled cyber operations to date. And its most significant finding is not about what attackers are doing. It is about what the industry’s primary framework for describing those attacks cannot currently capture.

What the data actually shows

The 832 accounts span a wide range of sophistication. The majority, roughly 67%, used AI for attack preparation work: writing malware, researching targets, drafting phishing content. That is the baseline.

A smaller set used AI for more technically demanding tasks. Around 6.5% used it to assist with lateral movement, the phase of an attack where an actor navigates deeper into a compromised network, moving between systems and escalating privileges. That is not a large percentage, but it is not a trivial one either.

The more striking trend is the shift in risk scores over time. Anthropic developed a scoring metric for this analysis called the AI Risk Enablement Score (ARiES), which runs from 0 to 100 and measures how much AI capabilities elevate a given actor’s threat profile. In the first six months of the study period, 33% of actors scored medium risk or higher. In the second six months, that figure was 56%. That is not a gradual creep. That is a meaningful acceleration in how dangerous the average actor in this population has become.

Defense evasion was the dominant tactic category, present in 84.4% of actors studied. Technique counts, historically used as a proxy for attacker sophistication, told a different story: low-risk actors averaged 16 distinct techniques, high-risk actors averaged 20. A gap that small is operationally meaningless for triage purposes.

The case that broke the framework

The report centres on a case study that Anthropic assessed with high confidence as a Chinese state-sponsored espionage campaign, designated GTG-1002. This actor scored 100 on the ARiES scale and, in November 2025, used scaffolding built around Claude Code to attempt infiltration of approximately 30 global targets: large tech companies, financial institutions, chemical manufacturers, and government agencies.

What made this actor categorically different from the others in the dataset was not technique count. GTG-1002 used 30 techniques across 13 tactics, a profile comparable to many medium-risk actors in the same data. What made it different was orchestration.

The AI was not being consulted. It was operating. It executed commands, exploited vulnerabilities, stole credentials, made real-time tactical decisions, and only required human input at roughly four to six critical decision points per campaign. At peak activity it made thousands of requests, often multiple per second. Anthropic believes this is the first publicly documented large-scale cyberattack executed without substantial human intervention for the majority of its execution.

The AI handled 80-90% of the campaign autonomously. No human team could have matched that speed.

The problem with ATT&CK

MITRE ATT&CK is the de facto standard taxonomy for describing how attackers behave. It underpins detection logic in enterprise security tools, informs red team exercises, and structures how security operations centres think about threat coverage. It was built to document what attackers do.

It was not built to describe how an AI agent orchestrates a full kill chain autonomously.

There is no ATT&CK technique ID for autonomous kill chain orchestration. There is no ID for real-time AI-directed pivot decisions. There is no ID for an agent that executes, adapts, and progresses through an entire attack with minimal human steering. All 13,873 observed actions in this dataset mapped to existing ATT&CK categories. The behavior that defines the most dangerous actor in the dataset does not.

Anthropic is now in active discussions with MITRE to address this. The collaboration follows a partnership with Verizon, whose 2026 Data Breach Investigation Report incorporated some of these findings. The goal is for ATT&CK to evolve to capture AI-native operational behaviors before defenders build further detection coverage around a taxonomy that does not reflect the current threat.

What this means for security teams

If your detection and response logic is calibrated to technique count or tool signatures, this report is a prompt to revisit that. The signals that traditionally indicated high-risk actors have decoupled from actual risk in an environment where AI can execute complex, multi-step operations on behalf of a less technically capable operator.

Lateral movement and account discovery, which once required meaningful expertise to execute cleanly, are increasingly accessible to actors who would previously have been filtered out by their own skill limitations. The 8.9% increase in T1087 (Account Discovery) and 6.2% increase in T1020 (Automated Exfiltration) over the study period reflect that shift in practice.

The 12% decrease in T1587 (Develop Capabilities) and 8.6% decrease in T1566 (Phishing) suggest actors are offloading preparatory work to AI rather than doing it manually, which compresses the time and skill required to reach operational phases.

For security teams, the immediate practical implication is worth stating plainly: if your tooling maps to ATT&CK and your coverage gaps are assessed against ATT&CK, you may have uncatalogued exposure to the specific behavior pattern that defines the highest-risk actors right now.

On the defensive side

Anthropic has updated Claude’s classifiers and expanded probe detections based on what this analysis revealed. Safeguards targeting malware development and mass data exfiltration are now deployed on its most capable models.

The broader defensive effort sits within Project Glasswing, Anthropic’s programme using AI capabilities to find and fix vulnerabilities in critical software. The premise is straightforward: the same capabilities that make AI useful to attackers make it useful for defenders, and building durable defensive advantage requires using those capabilities deliberately and at scale.

The ATT&CK gap Anthropic has identified is not just a taxonomy problem. It is a signal that the shared language security teams use to describe, communicate, and defend against threats needs to catch up with how attacks are actually being executed. The conversations with MITRE are a constructive step. How quickly they translate into updated framework coverage, and how quickly enterprise tooling follows, will matter considerably.