← Back to articles

512K Lines of AI Agent Code: What the Claude Code Leak Revealed

Anthropic accidentally shipped their source code via npm. Here's what 512,000 lines tell us about how AI coding agents really work.

On March 31st, someone noticed that Anthropic had shipped a .map file alongside their Claude Code npm package. Source maps are debug files that reconstruct the original source from minified bundles. They’re supposed to be stripped before publishing. This one wasn’t.

Within hours, the full source code of Claude Code—Anthropic’s flagship AI coding agent—was mirrored across GitHub, dissected on Hacker News (over 2,000 upvotes and 1,000 comments), and picked apart by every developer with a terminal and an opinion.

I’ve been using Claude-powered agents daily for months. So when 512,000 lines of the code powering one of the most commercially successful AI agents leaked, I cleared my morning and started reading.

How It Happened

The irony is almost too perfect. Anthropic acquired Bun (the JavaScript runtime) late last year, and Claude Code is built on top of it. A Bun bug—filed on March 11—reports that source maps are served in production mode even though Bun’s docs say they should be disabled. The issue was still open twenty days later when the leak happened. Anthropic’s own toolchain shipped a known bug that exposed their own product’s source code.

To make it worse, when Anthropic tried to pull the package, they ran npm deprecate instead of npm unpublish—which marks the package as deprecated but leaves it available for download. The code stayed live while the internet did its thing.

As one commenter put it: “accidentally shipping your source map to npm is the kind of mistake that sounds impossible until you remember that a significant portion of the codebase was probably written by the AI you are shipping.”

Anti-Distillation: Poisoning the Copycats

The most technically interesting finding was Claude Code’s defense against distillation—the practice of recording a powerful model’s outputs to train a cheaper copycat.

Claude Code has two mechanisms. First, fake tool injection: a flag called ANTI_DISTILLATION_CC tells the server to silently inject decoy tool definitions into the system prompt. Anyone recording API traffic to train a competing model would end up with training data full of tools that don’t exist. Second, connector-text summarization: the API buffers Claude’s reasoning between tool calls, replaces it with a cryptographic summary, and restores the original later. Eavesdroppers only capture summaries, not the full reasoning chain.

How effective are these? Against a determined attacker, not very. A proxy that strips the anti_distillation field from requests bypasses the first mechanism entirely. There’s even an environment variable that disables the whole thing. The real protection against distillation is legal, not technical—but it’s fascinating to see the engineering attempt.

DRM for API Calls

Claude Code includes a client attestation system that works like DRM. API requests carry a cch=00000 placeholder in the billing header. Before the request leaves the process, Bun’s native HTTP stack—written in Zig, below the JavaScript runtime—overwrites those zeros with a computed hash. The server validates the hash to confirm the request came from a genuine Claude Code binary.

This directly explains why Anthropic sent legal threats to OpenCode, an open-source alternative that had been using Claude Code’s APIs to access Opus at subscription rates. Even if you replicate the JavaScript perfectly, the Zig-level attestation fails without the official binary. It’s not bulletproof—running the bundle on stock Bun sends literal zeros—but it’s a meaningful barrier.

Undercover Mode: AI That Hides Its AI

This one generated the most debate. A file called undercover.ts implements a mode that strips all traces of Anthropic internals when Claude Code operates outside Anthropic’s own repositories. It instructs the model to never mention internal codenames, Slack channels, repo names, or even “Claude Code” itself.

The kicker: “There is NO force-OFF. This guards against model codename leaks.”

You can force it ON, but there’s no way to force it off. The practical implication is that AI-authored commits and pull requests from Anthropic employees in open source projects carry zero indication that an AI wrote them. Hiding internal codenames is reasonable. Having the AI actively conceal its own involvement is a different conversation—one about attribution, trust, and what “open source contribution” means when the contributor is an LLM in stealth mode.

KAIROS: The Unreleased Autonomous Agent

The biggest product roadmap reveal. Throughout the codebase, there are references to a feature-gated mode called KAIROS—an unreleased autonomous agent that includes a /dream skill for “nightly memory distillation,” daily append-only logs, GitHub webhook subscriptions, background daemon workers, and cron-scheduled refresh cycles.

If you’re familiar with tools like OpenClaw that already provide background agents, cron scheduling, and persistent memory, KAIROS looks like Anthropic’s answer to the same problem: an always-on agent that doesn’t need a human at the terminal. The scaffolding is there, even if the feature is heavily gated. Competitors can now see exactly how Anthropic plans to evolve Claude Code from a reactive CLI into a proactive autonomous system.

The Small Details That Tell Big Stories

A few other findings worth noting:

Frustration detection via regex. An LLM company using regular expressions for sentiment analysis sounds like a joke, but userPromptKeywords.ts contains a regex that catches wtf, ffs, piece of shit, and about twenty other expressions of developer rage. It’s pragmatic—a regex is microseconds and zero tokens, versus an LLM call just to check if someone is swearing at your tool.

250,000 wasted API calls per day. A comment in autoCompact.ts reveals that 1,279 sessions had 50+ consecutive compaction failures, wasting ~250K API calls daily. The fix was three lines: cap consecutive failures at 3, then disable compaction for the session. A good reminder to instrument your failure paths, especially in autonomous systems that can loop without human oversight.

A 3,167-line function. The file print.ts contains a single function with 12 levels of nesting and ~486 branch points. It handles the agent run loop, signals, rate limits, auth, MCP lifecycle, plugin management, model switching, and more. Should be 8-10 separate modules. Even Anthropic ships spaghetti when they’re moving fast.

23-point bash security. On the other end, every shell command runs through 23 security checks: blocked Zsh builtins, defense against equals expansion (=curl bypassing permissions), unicode zero-width injection, and a HackerOne-reported bypass. I haven’t seen another AI tool with this specific a shell threat model.

What This Actually Means

Some people are downplaying this because Google’s Gemini CLI and OpenAI’s Codex are open source. But those companies released agent SDKs—toolkits for developers. Anthropic leaked the full internal wiring of their flagship commercial product: the feature flags, the product roadmap, the anti-competitive mechanisms, the unreleased autonomous mode.

The code can be refactored. The strategic surprise can’t be un-leaked. For those of us building AI agents, though, the leak is genuinely educational. The anti-distillation patterns, client attestation, shell security model, and prompt cache optimization are production-tested patterns from a tool operating at massive scale. You can disagree with specific decisions, but the engineering lessons are real and transferable.

And if you’re shipping npm packages—check your source maps.


If you’re new to AI agents, start with What Are AI Agents? for the fundamentals. And if the security angle interests you, my piece on Prompt Injection & Agent Security covers the broader threat landscape that makes Claude Code’s 23-point bash security checklist necessary.