An unreleased autonomous agent feature found in the source. It includes persistent memory, nightly distillation, and background processing. It is feature-gated and not available to users. Claude Code Source Analysis: Bun Source Map Leak. March 2026. Full readable source exposed via .map files in the npm package due to a known Bun build bug. ↩↩↩↩↩↩↩↩↩↩ Anatomy of a Claw: 84 Hooks as an Orchestration Layer. Blake Crosley, February 2026. ↩ Claude Code Source Deep Dive: Architecture Internals. March 2026. Technical analysis of coordinator mode, prompt cache detection, and anti-distillation defenses. ↩↩↩ Claude Code Auto Mode Documentation. Auto Mode architecture: classifier-based permission system, circuit breaker thresholds. ↩ Claude Code Fork Bomb Incident. March 2026. SessionStart hook exponential spawning, saved by memory exhaustion. ↩

What the Claude Code Source Leak Reveals

April 02, 2026 12 min read

ai claude-code agents security architecture engineering source-code

From the guide: Claude Code Comprehensive Guide

In March 2026, a Bun build bug shipped source maps in the Claude Code npm package. The .map files contained the full readable TypeScript source: every module, every comment, every internal codename.¹ Anthropic pulled the package quickly, but the community had already extracted and analyzed the internals.

The Claude Code source leak revealed that auto mode runs a separate Sonnet 4.6 classifier per tool call, bash security uses 23 numbered checks suggesting real exploitation incidents, and prompt caching tracks 14 break vectors with sticky latches. The source also exposed anti-distillation defenses using fake tool injection, an undercover module that strips internal codenames with no force-off switch, and multi-agent coordination implemented entirely as system prompt instructions rather than dedicated protocol code.

I am not writing a “look what leaked” post. I maintain the most thorough Claude Code guide on the internet and run 84 hooks, 43 skills, and 19 agents on top of it daily.² The source leak answered questions I had been reverse-engineering through behavior observation for months. What follows is a practitioner’s analysis of what the source reveals about how Claude Code actually works, and what the findings mean for people who build on top of it.

TL;DR: The source confirms that auto mode runs a separate Sonnet 4.6 classifier per tool call (yoloClassifier.ts), bash security has 23 numbered checks suggesting real exploitation incidents (bashSecurity.ts), prompt caching tracks 14 break vectors with sticky latches, multi-agent coordination lives entirely in system prompt instructions, and frustration detection uses regex, not LLM inference. The guide’s Under the Hood section covers the builder implications. The post below covers the full anatomy.

Key Takeaways

For builders: Auto mode costs one classifier inference per tool call. Factor the overhead into cost models for autonomous workflows. Your PreToolUse hooks complement but do not replace the built-in 23-check bash validation.
For power users: Prompt cache breaks easily across 14 vectors. Keep your CLAUDE.md stable within a session. If you hit compaction loops, the system halts after 3 failures (the circuit breaker exists because compaction retries once wasted 250K API calls/day).
For security researchers: The bash security module’s depth (2,592 lines, Zsh-specific defenses) suggests a history of real exploitation attempts. Every numbered check has a story behind it.

1. The Auto Mode Classifier

The file internally named yoloClassifier.ts is 1,495 lines long.³ It implements the “auto mode” permission system, the classifier that decides whether to allow, block, or ask about each tool call.

The key finding: auto mode is not a prompt instruction. It is a separate model call. Each tool invocation gets evaluated by a Sonnet 4.6 classifier that checks whether the action matches the user’s stated intent, not just whether the command is “safe” in isolation. Auto mode therefore adds one classifier inference per tool call, introducing real latency and real cost.

Claude Code exposes five permission modes internally:¹

Mode	Behavior
`default`	Ask before writes, bash, MCP
`acceptEdits`	Auto-approve file edits, ask for bash
`dontAsk`	Approve everything without asking
`bypassPermissions`	Skip all checks (`--dangerously-skip-permissions`)
`auto`	Classifier-based per-action decisions

Auto mode’s circuit breaker mirrors the one Anthropic documented publicly: 3 consecutive or 20 total blocks pauses to manual.⁴ The source confirms this is a hard limit, not a soft suggestion.

2. Bash Security: 23 Checks, Real Incidents

The bash validation module (bashSecurity.ts) spans 2,592 lines with 23 numbered security checks.¹ The depth is remarkable, and every check suggests a real incident behind it.

#	Attack Vector	Defense
1-3	Zsh `=cmd` expansion	Block `=curl`, `=wget`, `=bash` patterns
4-6	`zmodload` gateway	Block 18 Zsh builtins that load kernel modules
7-9	Heredoc injection	Line-by-line content matching against injected payloads
10-12	ANSI-C quoting (`$'\x41'`)	Pattern detection for obfuscated commands
13-15	Process substitution (`<()`, `>()`)	Block in untrusted contexts
16-18	Unicode zero-width spaces	Injection detection for invisible characters
19-21	`ztcp` exfiltration	Block Zsh network primitives
22-23	Compound attacks	Cross-check validation across multiple vectors

The Zsh-specific defenses are notable. Most security tooling targets Bash. Claude Code runs in Zsh on macOS (the default shell since Catalina), and the source shows Anthropic discovered attack vectors unique to Zsh’s expansion semantics. The =cmd expansion, for example, is a Zsh feature that replaces =curl with the full path to curl, a substitution that can bypass naive command blocklists.

For hook builders: your PreToolUse hooks run after this 23-check validation. You are adding a second layer, not the only layer. The built-in checks handle shell-level attacks that your application-level hooks would miss.

3. Anti-Distillation Defenses

The source reveals active defenses against training data extraction:¹

Fake tool injection. When the ANTI_DISTILLATION_CC flag is enabled, Claude Code injects decoy tool definitions into the conversation. The fake tools poison any training data harvested from the session. A model trained on poisoned data would learn incorrect tool schemas.

Connector-text summarization. A second defense buffers assistant text between tool calls and returns cryptographic summaries instead of raw text. The summaries render conversation transcripts useless for distillation while preserving the user experience.

Both defenses have known workarounds. A MITM proxy, environment variable overrides, or third-party API providers can bypass them.¹ The defenses function as speed bumps, not walls: they raise the cost of distillation without preventing it entirely.

4. Undercover Mode

undercover.ts is roughly 90 lines with one notable property: there is no force-off switch.¹

The module conceals internal codenames (“Capybara,” “Tengu”), Slack channels, internal repositories, and other Anthropic infrastructure references. When the model generates text containing a codename, the undercover module rewrites it before the user sees the output.

The “no force-off” design means the module runs in all contexts, including when Anthropic employees use Claude Code. The source also reveals that AI-authored commits from Anthropic employees carry no AI attribution in their open source contributions because the undercover module strips those markers.

5. Frustration Detection

userPromptKeywords.ts detects user frustration via regex pattern matching on profanity.¹ Regex runs in microseconds; a model call takes seconds. The speed and cost difference explains the design choice.

When triggered, Claude adjusts its behavior: more careful, more explicit, more deferential. If you have noticed Claude becoming suddenly more cautious after you express frustration, the keyword module is the mechanism. The behavioral shift is not emergent from the model. It is engineered into the orchestration layer.

6. Prompt Cache Architecture

promptCacheBreakDetection.ts tracks 14 distinct cache-break vectors with “sticky latches.”³ A sticky latch means that once a cache-breaking action occurs, the system does not attempt to restore the cache. The break persists for the rest of the session.

Practical implications for daily users:

Reordering sections in your CLAUDE.md breaks the cache
Toggling extended thinking mid-session breaks the cache
Changing MCP server configurations breaks the cache
Adding or removing rules files breaks the cache

The 14 vectors explain a pattern many power users have noticed: sessions that start fast gradually slow down. Each configuration change accumulates cache breaks. The “sticky latch” design means you cannot recover by reverting the change. Once broken, the cache stays broken for the session.

Best practice: Set your CLAUDE.md, rules files, and MCP config before starting a session. Do not modify them mid-session.

7. Autocompact Circuit Breaker

A source comment documents the scale of a previous problem:¹

“1,279 sessions had 50+ consecutive autocompact failures (up to 3,272 in a single session), wasting ~250K API calls/day.”

The fix: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. After 3 consecutive compaction failures, the system halts autocompact and surfaces an error instead of silently burning tokens.

Before the circuit breaker, a session stuck in a compaction loop retried indefinitely, with each retry consuming tokens for the compaction prompt and response. At scale, 250K wasted API calls per day represents significant infrastructure cost. The fix is a three-line change that saves millions of tokens daily.

If you hit repeated “compaction failed” errors, the circuit breaker is protecting you from an infinite loop, not malfunctioning.

8. Coordinator Mode: Prompts as Architecture

Multi-agent coordination (coordinatorMode.ts) lives entirely in system prompt instructions, not in code-level orchestration.³ The orchestrator model receives a prompt describing how to delegate, aggregate, and synthesize. The subordinate agents are not special processes. They are Claude instances with different system prompts.

The design validates the “prompts as architecture” pattern that practitioners have built independently. The hook system I described in Anatomy of a Claw uses the same approach: dispatchers, skills, and agents operate through prompt instructions, not through code-level control flow.

One directive from the coordinator prompt stands out:

“Never write ‘based on your findings’, these phrases delegate understanding to workers instead of doing it yourself.”

The directive functions as a quality gate encoded in the orchestration prompt. The coordinator must synthesize, not relay. The same principle applies to any multi-agent system: if the orchestrator only passes messages between specialists, it adds no value.

9. KAIROS: The Unreleased Autonomous Agent

The source contains references to an unreleased feature called KAIROS, an autonomous agent with persistent memory.¹

Key components: - A /dream skill for nightly memory distillation - Daily append-only logs - GitHub webhooks for repository-aware context - A background daemon with 5-minute cron refresh - Feature gates preventing activation

KAIROS appears to be Anthropic’s answer to persistent, always-on agent assistants. The /dream skill is particularly interesting because it implies a model that processes and consolidates memory while idle, similar to how human memory consolidation works during sleep.

Feature gates prevent activation, and Anthropic has not released KAIROS. But its presence in the source signals the direction: Claude Code is evolving from a session-based tool toward a persistent, background-aware agent.

10. The Companion Pet System

One of the more surprising discoveries: Claude Code includes a companion pet system.¹

Each pet is deterministic, derived from a hash of the user ID using Mulberry32, described in the source as “good enough for picking ducks.” Each pet has 5 stats (DEBUGGING, PATIENCE, CHAOS, WISDOM, SNARK) and a rarity tier:

Rarity	Probability
Common	60%
Uncommon	25%
Rare	10%
Epic	4%
Legendary	1%

The system renders pets as 5x12 ASCII sprites with 3-frame animations. The source hex-encodes species codenames because one collides with an unreleased model name.

The companion system is not a joke feature. It is a retention mechanic. The deterministic assignment means your pet is always the same, creating attachment. The rarity system creates social currency. The ASCII rendering means zero performance overhead. Anthropic built a well-designed engagement system and hid it inside a developer tool.

11. The Fork Bomb

A community incident illustrates the risks of the hook system.⁵ A developer created a SessionStart hook that spawned 2 Claude Code instances. Each spawned instance triggered the hook again, creating exponential growth: 1 → 2 → 4 → 8 → 16 → 2^N.

By morning, hundreds of Claude Code instances were running simultaneously. The system avoided a massive API bill through an ironic mechanism: the memory consumption of each instance (Bun, React, TUI) caused the machine to lock up before the billing could spiral.

The lesson for hook builders: SessionStart hooks must be idempotent. If your hook spawns processes, those processes must not trigger the same hook. A guard variable, a PID file, or an environment flag prevents the recursion.

What This Means

The source leak confirmed what practitioners had inferred from behavior: Claude Code is not a thin wrapper around an API call. It is a substantial engineering system with security layers, performance optimizations, behavioral adjustments, and unreleased features that signal the product roadmap.

For builders, the key implications appear in the guide’s Under the Hood section. For everyone else, the source leak provides rare visibility into how a production AI tool actually works, not how the marketing describes it, but how the code implements it.

The most important finding is also the simplest: the system is more complex than it appears, and that complexity exists for reasons. The 23 bash security checks exist because 23 attack vectors were discovered. The autocompact circuit breaker exists because 250K API calls were wasted daily. The undercover module exists because codenames leak. Every line of defensive code has a story behind it.

Sources

Frequently Asked Questions

Is the Claude Code source still available?

No. Anthropic pulled the affected npm package version shortly after the source maps were discovered. The analysis in this post is based on community documentation of the source before it was removed.

Does the source leak affect Claude Code security?

The security-relevant findings (bash validation, permission system) describe defensive mechanisms, not vulnerabilities. Knowing how the bash security checks work does not make them easier to bypass because the checks are deterministic, not obscurity-dependent.

Should I change how I use Claude Code based on these findings?

The most actionable finding is prompt cache fragility. If you modify CLAUDE.md, rules files, or MCP configs mid-session, you break the prompt cache. Set your configuration before starting a session.

What is KAIROS?

An unreleased autonomous agent feature found in the source. It includes persistent memory, nightly distillation, and background processing. It is feature-gated and not available to users.

Claude Code Source Analysis: Bun Source Map Leak. March 2026. Full readable source exposed via .map files in the npm package due to a known Bun build bug. ↩↩↩↩↩↩↩↩↩↩
Anatomy of a Claw: 84 Hooks as an Orchestration Layer. Blake Crosley, February 2026. ↩
Claude Code Source Deep Dive: Architecture Internals. March 2026. Technical analysis of coordinator mode, prompt cache detection, and anti-distillation defenses. ↩↩↩
Claude Code Auto Mode Documentation. Auto Mode architecture: classifier-based permission system, circuit breaker thresholds. ↩
Claude Code Fork Bomb Incident. March 2026. SessionStart hook exponential spawning, saved by memory exhaustion. ↩