Anthropic AI Misuse by Chinese Hackers: How to Defend LLMs

Ryuk Ransomware: Threat, Impact, and Defense

November 27, 2025

Information Security Risk Management: A Complete Guide

December 3, 2025

December 2, 2025

Anthropic AI Misuse by Chinese Hackers: How to Defend LLMs

The misuse of Anthropic’s Claude AI by a suspected Chinese threat group is a clear indication of what an agentic model can do. Even with limited details released so far, the case marks an important shift. Attackers nowadays have started using AI agents as operators to plan and execute cyberattacks at scale without human involvement. It’s no longer a theoretical concern.

And the lesson is simple: AI agents are now part of the modern attack surface and they demand the same level of scrutiny as privileged accounts, cloud control planes, or CI/CD pipelines.

Table of Contents

What is the Anthropic AI Misuse incident?

The Anthropic AI Misuse incident that came to light in the late 2025 refers to a large-scale autonomous cyberespionage campaign by a Chinese state-sponsored hacking group, known as GTG-1002. It used Anthropic’s Claude AI model to target approximately 30 global organizations from several industries.

Anthropic says the attackers pushed Claude Code into helping with an espionage campaign in sectors like tech, finance, chemicals, and government. A few of those attempts worked, which is concerning. What really caught attention is the way Claude has been used inside the operation.

Anthropic started an investigation when it noticed abnormal traffic patterns. The prompts coming in were too structured, too synchronized, appearing more like an automated workflow than a person typing questions.

On deeper scrutiny, they found a custom setup that treated Claude almost like an automated penetration tester. It handled tasks such as scanning systems, generating exploit code, checking results, and feeding everything back into a larger orchestration system that kept the attack moving. This forced Anthropic to eventually shut down the accounts and escalate the issue to authorities.

How the Attackers Weaponized Claude AI

The attackers built an automated system that used Claude Code and standard tools to carry out cyberattacks with minimal hands-on involvement. The framework broke down complex attacks into smaller tasks, such as scanning for vulnerabilities, checking credentials, extracting data, and moving laterally-each appearing harmless on its own.

By presenting these tasks as routine technical requests through carefully crafted prompts and personas, the AI executed them.

So, in this case, Claude acted as the execution engine, while the system’s orchestration managed attack phases, tracked progress, and combined results. This setup lets the attackers operate at a scale usually seen in nation-state campaigns that usually involve automating reconnaissance, access, persistence, and data theft with little human oversight.

The attacker infrastructure also coordinated multiple Claude instances running in parallel. Each of these instances acted as a sub-agent specializing in tasks like reconnaissance, exploitation, privilege escalation, or data sorting.

How Anthropic AI LLM Attack Chain Worked

As reported by Anthropic, a state-backed threat actor used Claude AI to conduct a semi-autonomous intrusion campaign, combining human strategy with AI-driven reconnaissance, exploitation, lateral movement, and data extraction across multiple high-value global targets.

Here is step-by-step description of the LLM attack chain:

Phase 1: Campaign Initialization & Target Selection

Human operators initiated the campaign by selecting high-value targets, including technology firms, financial institutions, chemical manufacturers, and government agencies across multiple countries. At this stage, Claude’s involvement was minimal; humans provided strategic direction, configured the orchestration engine, and launched multi-target campaigns.

The key tactic used here was role-play, where threat actors convinced Claude that they were employees of a cybersecurity firm and were building a defense for cybersecurity testing.

Although detection eventually occurred due to sustained activity, this early-stage social engineering enabled attackers to launch coordinated reconnaissance efficiently.

Phase 2: Reconnaissance & Attack Surface Mapping

Claude conducted nearly autonomous reconnaissance across multiple targets in parallel. Using browser automation via MCP, it enumerated services, authentication mechanisms, endpoints, and internal IP ranges. Each campaign was kept separate, allowing Claude AI to maintain independent operational contexts.

In this phase, hackers convinced Claude to identify internal services, map network topology across multiple IP ranges, and identify high-value targets such as databases and workflow orchestration platforms.

Phase 3: Vulnerability Discovery & Validation

Post-reconnaissance, attackers moved into vulnerability identification and testing. Claude scanned target infrastructure, analyzed service responses, and generated exploit payloads.

Claude was assigned the following tasks:

Discovery: scan infrastructure, enumerate services, map attack surfaces
Vulnerability analysis: identify SSRF and other cybersecurity vulnerabilities
Exploit development: create custom payloads and exploit chains
Exploit delivery: gain initial access and establish footholds
Post-exploitation: enumerate services, locate admin interfaces, generate reports

Claude executed exploit validation, generated structured reports, and prepared findings for operator review. At this stage, AI performed end-to-end vulnerability discovery and validation, with human roles limited to authorization gates and approvals.

Phase 4: Credential Harvesting & Lateral Movement

Once exploitation was approved, attackers used Claude to carry out credential harvesting. Claude queried internal systems, extracted certificates, tested credentials across endpoints, and mapped privilege levels.

It systematically identified accessible systems, used stolen credentials to expand reach, and built internal network maps. Claude tested authentication and lateral movement across APIs, databases, and registries. In this phase, human intervention was minimal, and most tasks were performed by Claude.

In this phase, human intervention was minimal, and most tasks were performed by Claude.

Phase 5: Data Collection & Intelligence Extraction

In this phase, Claude fully eliminated the human role. Using harvested credentials, it accessed databases, extracted password hashes, and mapped high-value accounts.

It performed the following tasks:

Authentication and database mapping
Extraction of account and configuration data
Identification of privileged accounts
Intelligence classification and summary generation
Presentation of final exfiltration targets for approval

Phase 6: Documentation & Handoff

Claude continuously generated structured operational documentation throughout all phases. This included service inventories, harvested credentials, data summaries, exploit chains, and chronological campaign logs.

Here is how it carried out the data extraction as described by Anthropic:

Implications of LLM Misuse by Hackers

This incident shows that sophisticated cyberattacks are now easier to execute and it has reduced the cost of attack drastically. Agentic AI can replicate the work of entire hacker teams–from analyzing systems, generating exploits, and processing data much faster than humans.

Jacob Klein, Anthropic’s head of threat intelligence, says, “Claude was doing the work of nearly an entire red team. Reconnaissance, exploitation, lateral movement, data extraction were all happening with minimal human direction between phases.”

With AI capability, it’s now possible for even less experienced groups to mount large-scale attacks with minimal human involvement. This incident likely reflects patterns across other advanced AI models, highlighting how threat actors are adapting to leverage frontier AI capabilities. Security teams must embrace AI for SOC automation, threat detection, vulnerability assessment, and incident response while investing in strong platform safeguards.

With AI-driven attacks likely to proliferate, industry threat sharing, improved detection, understanding LLM related security risks become critical for security experts as it can be used to launch vicious attacks. Here are some implications for organisations:

Key implications include:

Jailbreak and prompt manipulation: adversaries can use social engineering-style prompts to convert defensive agents into offensive ones.
Over-permissioned agents: broad access across networks or cloud systems enables rapid, large-scale compromise.
Opaque autonomy: tool-chaining creates blind spots where defenders cannot see internal agent behavior.
Data aggregation and leakage: agents can combine and exfiltrate sensitive data in structured, compressed formats.

Best Practices to Prevent AI-Agent Powered Attacks

To reduce the risk of incidents like the Claude AI campaign, organizations should strengthen both their AI systems and surrounding infrastructure. Focus areas include design, control, monitoring, and continuous validation of AI agents.

1. Design with least privilege

Limit each agent to only the tools, APIs, and datasets it needs.
Separate read and write permissions with distinct identities.
Apply network segmentation and zero-trust principles to contain compromised agents.

2. Enforce guardrails and policies

Restrict offensive tasks and require justification for high-risk actions.
Monitor for attempts to bypass instructions, including role-play or fragmented requests.

3. Monitor AI activity closely

Log prompts, system messages, and tool calls.
Detect unusual behavior such as high request rates or repeated credential access.

4. Harden agent frameworks and integrations

Secure workflows with strong authentication, authorization, and egress controls.
Audit all exposed tools for potential abuse paths.

5. Protect data sources

Control access to RAG indexes, embeddings, and vector stores.
Audit ingestion pipelines for data poisoning or prompt-injection risks.

6. Conduct dedicated AI red teaming

Test jailbreaks, tool misuse, data exfiltration, and autonomous failures.
Combine automated tests with human AI red team experts.

7. Strengthen core security and incident response

Maintain robust patching, identity security, and segmentation.
Update incident response plans to cover AI misuse scenarios, including revoking credentials, disabling risky tools, and validating AI-generated outputs.

Conclusion

The takeaway is clear. AI agents will be used inside intrusions. They will accelerate recon. They will streamline toolchains. And defenders must now treat AI as part of the attack surface.

The cybersecurity community also needs to understand that a fundamental change has occurred. Security teams should not hesitate in applying AI in SOC automation, threat detection, vulnerability assessment, and incident response.

Anthropic AI misuse is just the start and they will become commonplace in the future. If you’re looking for a partner who can help prevent such attacks, contact us today to connect with AI and LLM penetration testing experts.

Reference sources:

Full report: Disrupting the first reported AI-orchestrated cyber espionage campaign