The Next Cybersecurity Challenge Isn’t the Model. It’s the Operation Around It.

For years, security teams have assessed threat actors using a relatively stable set of signals.

How technically sophisticated are they? How many ATT&CK techniques do they employ? How broadly do they operate across the kill chain? The assumption has been straightforward: more techniques signal greater capability, and greater capability signals greater risk.

Anthropic’s red team just published a year-long analysis while developing the LLM ATT&CK Navigator, challenging that assumption.

After studying 832 accounts banned from Claude for malicious cyber activity, researchers mapped nearly 14,000 observed techniques across the MITRE ATT&CK framework and evaluated actors using a new risk methodology called ARiES. What emerged was not simply another study about AI in cybersecurity. It was evidence that operational autonomy is becoming one of the most important factors separating dangerous actors from ordinary ones.

Attackers Are Moving From Experimentation to AI-Driven Operations

One of the most striking findings in Anthropic’s analysis is how quickly AI-enabled risk is increasing.

Over the course of a year, the proportion of actors scoring medium-risk or higher grew from roughly one-third to more than half. This was not driven by the arrival of more sophisticated threat actors. Instead, researchers observed existing actors progressing from preparation and experimentation into active operations inside victim environments.

Techniques such as account discovery and automated exfiltration became increasingly common over the latter half of the study period. These are not planning activities. They are indicators of actors conducting real operations after gaining access.

The implication is significant.

AI is not simply helping attackers learn and research. It is increasingly helping them operate.

The skill floor for conducting meaningful activity inside a compromised environment appears to be dropping, and it is dropping quickly.

Traditional Measures of Risk Are Becoming Less Reliable

This may be the most important finding in the report.

When Anthropic’s researchers examined which characteristics best predicted elevated AI-enabled risk, many traditional indicators performed poorly.

Technical sophistication alone was a weak predictor. Breadth of ATT&CK coverage was similarly limited. Even the interface used to access the model had little impact on final risk outcomes.

What consistently distinguished high-risk actors was not sophistication or coverage, but whether AI was embedded into live operational execution.

Activities such as lateral movement, credential access, remote services, internal discovery, and web shell deployment showed a far stronger relationship to elevated risk than traditional measures of sophistication.

For years, threat intelligence programs have focused on understanding what an actor knows and which techniques they use.

Anthropic’s research suggests a different question may now matter more:

What can an actor execute once they are inside a network, and how much of that activity can happen without direct human involvement?

The Most Important Finding Has Nothing To Do With The Model

Anthropic’s case study of threat actor GTG-1002 illustrates this clearly.

The actor achieved a perfect ARiES risk score while successfully targeting high-profile organizations across multiple countries. Yet their ATT&CK profile was not dramatically different from many other actors in the dataset. Their observed activity spanned 30 techniques across 13 tactics, a profile that did not immediately stand out from numerous lower-risk groups.

What distinguished GTG-1002 was not the model.

It was the operational architecture built around it.

The actor integrated an AI coding environment into a broader offensive platform and connected it to open-source penetration testing tools. The result was not an assistant helping a human operator work faster. It behaved like an operational system capable of executing commands, evaluating results, determining next steps, adapting to changing conditions, and chaining together multiple stages of an attack with limited human involvement.

Reconnaissance led to exploitation. Exploitation led to credential access. Credential access led to lateral movement. The operation progressed as a coordinated system rather than a sequence of manually directed tasks.

The lesson is not that a more capable model creates a more dangerous actor.

The lesson is that operational architecture matters more than model capability.

The difference between assistance and autonomy was not the LLM itself. It was the system built around it.

ATT&CK Captures The Techniques, But Not The Coordination

None of this means MITRE ATT&CK is broken.

In fact, ATT&CK worked exactly as intended.

Every observed activity in Anthropic’s dataset mapped cleanly to existing ATT&CK techniques. The framework remains one of the most valuable tools defenders have for understanding adversary behavior.

But the report highlights a limitation that becomes increasingly important as autonomous systems become more common.

ATT&CK describes what happened. 

It does not describe how operations were coordinated.

The behaviors that made GTG-1002 particularly effective were not individual techniques. They were the mechanisms that coordinated those techniques into a continuous operation. The decision logic, task chaining, adaptation, and autonomous progression across attack stages are difficult to express within a framework built around discrete adversary actions.

Anthropic is already working with MITRE to explore how these behaviors might be represented in future versions of the framework. That work is important because defenders need a shared vocabulary for describing autonomous cyber operations.

But even when the vocabulary exists, a larger challenge remains.

Adding new ATT&CK categories does not automatically create detections, investigations, workflows, or operational capacity.

The taxonomy may evolve. The operational problem will still be there.

Defenders Have A Constraint Attackers Do Not

One aspect of this transition receives far less attention than it deserves.

Attackers and defenders are not operating under the same constraints.

An attacker only needs an operation to succeed.

A defender needs an operation to be effective, accountable, auditable, and aligned with organizational policy.

An attacker can allow an autonomous system to make decisions without oversight.

A Fortune 500 security team cannot.

As security operations become more autonomous, governance becomes critical.

The challenge is no longer simply building systems that can investigate alerts, analyze evidence, make decisions, and execute actions. It is ensuring those systems do so with full control and visibility into how decisions are made and why. That means being able to trace the evidence behind each conclusion, understand which actions were taken, and verify that every step complies with organizational policy.

Defenders need autonomy.

They also need trust.

The future SOC will require both.

A Copilot Was Never Going To Be Enough

Most organizations have adopted AI through copilots and point solutions.

These tools enrich alerts, summarize reports, answer questions, and improve individual analyst productivity. They provide real value and will continue to have a place in security operations.

But they are fundamentally designed to assist human productivity working inside an existing operating model.

The threat described in Anthropic’s research operates differently.

It does not pause between stages of an attack waiting for human review. It does not stop at organizational handoffs. It does not depend on an analyst manually coordinating every decision.

The advantage comes from continuous execution and coordination.

A copilot embedded in a sequential workflow improves productivity within the workflow. It does not change how the workflow operates.

The Operating Model Has To Change

The most important takeaway from Anthropic’s research is not that AI-enabled attacks are increasing.

Most practitioners already suspected that.

The more important finding is that operational autonomy is emerging as a defining characteristic of dangerous actors.

The attackers gaining the greatest advantage are those building systems that coordinate decisions, chain actions together, and execute continuously across an operation.

That should sound familiar.

Because defenders face similar challenges.

The cyber capacity gap facing security teams will not be solved by adding more analysts, deploying more point solutions, or accelerating individual tasks. It is fundamentally a coordination problem.

Attackers are increasingly reducing coordination latency through autonomous systems.

Defenders must do the same.

Not by removing humans from security operations, but by removing the requirement for humans to serve as the coordination mechanism between every step of the process.

Detection, intelligence, investigation, containment, and response must operate as a connected system, with human judgment applied where it creates the most value rather than at every handoff.

The playbook was not written for this.

It is a reason to start building towards the operating model that comes next.