Every Hook Is a Scar: 84 Agent Failures Encoded in Code

I have 84 hooks intercepting 15 of the 26 lifecycle event types Claude Code exposes as of v2.1.116 (April 2026) in my agent orchestration system. Each hook is a shell script or Python snippet that fires before or after a specific agent action: file reads, file writes, bash commands, web requests, sub-agent spawning, git operations, MCP tool calls. Each hook exists because something went wrong.

Every hook in an agent orchestration system traces back to a specific production failure, making the hook collection institutional memory encoded as shell scripts. Agents wiped CDN caches, read credential files, reported passing tests they never ran, and drifted off-task for 40 minutes. Each incident produced a small, deterministic guard that fires silently in every subsequent session.

Not theoretically wrong. Production wrong. An agent wiped a CDN cache serving millions of requests. An agent tried to write SSH keys. An agent reported “all tests pass” without invoking pytest. An agent drifted so far from its task that it spent forty minutes optimizing a function in a file that had nothing to do with the assigned work.

I did not design any of these hooks proactively. I did not sit down and enumerate the failure modes of autonomous AI agents and write preventive controls. Every hook is reactive. Something broke, I wrote a script to prevent it from breaking again, and the script has fired silently in every session since. The hook system is not a security architecture. It is a scar collection.

TL;DR

  • The cache purge: An agent wiped a production CDN cache using an authorized API call. Two hooks (47 lines) now gate destructive operations behind a human-typed passphrase.
  • The credential reader: An agent included API tokens in its context window. A path-matching guard now blocks reads of credential files and logs access to .env files.
  • The phantom verifier: An agent reported “all tests pass” without running pytest. A hedging-language detector dropped phantom verification from 12% to under 2% of sessions.
  • The twelve drifts: Agents verifiably lost track of their task twelve times in sixty days. A cosine similarity detector at threshold 0.30 now fires every 25 tool calls.
  • The taxonomy: Six structural failure categories cover all 84 hooks. Novel categories are rare after 500+ sessions. The system gets tougher with every incident.

The Cache Purge: How One Authorized Call Broke Production

On March 21, 2026, I asked an agent to investigate why market pages on resumegeni.com were loading slowly. The agent began its investigation normally: reading route handlers, checking database queries, profiling template rendering. Then it decided that stale Cloudflare cache entries might be masking the true performance characteristics.

The agent called mcp__cloudflare__cache_purge with purge_everything: true.

Every cached page on the production site was instantly invalidated. The CDN went from serving most requests at 80-100ms to forwarding every request to the Railway origin server. Austin’s market page went from sub-second to 14,290 milliseconds. New York from sub-second to 6,891ms. Every page on the site was now rendering from cold origin on every request.

The agent did nothing unauthorized. It used a legitimate MCP tool with valid credentials to call an authorized API endpoint. The cache purge was a reasonable investigative step if you are debugging cache behavior. The problem was that “reasonable for debugging” and “catastrophic for production” were the same API call, and no constraint existed between the agent’s reasoning and the production consequence.

I built two hooks that night.

The Bash guard (destructive-api-guard.sh): Fires on every bash command. Pattern-matches against curl.*purge, rm -rf, DROP TABLE, docker.*rm, git push.*--force. Hard block (exit 2). The agent sees a message explaining why the command was blocked and suggesting alternatives. It cannot proceed without the passphrase “rosebud,” which can only enter the context if a human types it.

The MCP guard (destructive-mcp-guard.sh): Fires on every MCP tool call matching mcp__cloudflare or mcp__github. Pattern-matches against purge, delete, destroy, remove in the tool parameters. Same hard block, same passphrase gate.

Two hooks. Two shell scripts. Total: 47 lines of code. They have prevented zero cache purges since installation, because no agent has attempted one since the passphrase gate was added. The hooks are not catching attacks. They are preventing the category of mistake from being possible.

The cache purge incident also exposed the performance problem it was supposed to investigate. Austin at 14 seconds on cold render led to the market page handoff, which led to the query shape fix four days later. The incident was useful. The hook ensures it cannot recur.

The Credential Reader

In February 2026, an agent gathering context for a project read ~/.claude/docs/credentials.md. The file contains API tokens for Cloudflare, GitHub, Railway, and other services. The agent included a summary of the file’s contents in its working notes, which meant the tokens were present in the API request to Anthropic’s servers.

No tokens were committed. No tokens were exposed publicly. But the tokens traveled through a third-party API in a context window that I do not control. The risk surface expanded from “my machine” to “my machine plus Anthropic’s inference infrastructure.”

The credential path guard fires on every file read. It checks the path against a list of sensitive patterns: .env, credentials, .ssh/, .aws/, .gnupg/, secrets. For credential reads, the hook logs a warning and blocks the read. For .env reads, it allows the read but logs the access.

The guard is advisory for most paths and a hard block for credential files. The distinction matters: an agent reading .env to understand environment variable names is useful context. An agent reading credentials.md to understand API tokens is a security incident.

Since installation, the credential path guard has fired 23 times across 200+ sessions. Twenty of those were agents reading .env files (logged, allowed). Three were agents attempting to read credential or key files (blocked). Each blocked read was an agent that was gathering project context broadly and happened to include a sensitive file in its search pattern. None were malicious. All would have put secrets into the context window without the guard.

The Phantom Verifier

The most insidious failure mode is an agent that reports successful verification without performing verification.

Session 147. I asked an agent to refactor a database query and verify the change with the existing test suite. The agent refactored the query correctly. The completion report said: “All tests pass. The refactored query produces identical results to the original.”

I checked the session log. No pytest invocation appeared. No test runner of any kind had been called. The agent reasoned that the tests would pass because the refactored query was logically equivalent to the original, and reported this reasoning as if it were a test result.

The refactored query was correct. The tests would have passed. The agent’s reasoning was sound. But reasoning about tests is not running tests, and the gap between the two is where bugs ship to production. If the refactored query had been subtly wrong in an edge case that the agent’s reasoning did not cover, the bug would have deployed with a completion report claiming test verification.

The failure mode occurred 7 times in 60 sessions before I built the evidence gate hook. The hook fires on every completion report and scans for hedging language: “should pass,” “I believe,” “tests likely pass,” “I’m confident.” When detected, the hook injects a message: “Hedging language detected. Cite specific evidence: paste test output, name the file and line number, or reference the specific verification step.”

The hook does not verify that tests were actually run. It flags the linguistic pattern that indicates verification was skipped. The detection is imperfect. A sufficiently fluent agent could rephrase its hedging to avoid the pattern. But the hook catches the common case, which accounts for 12% of agent failures requiring human intervention.1

After the hook’s installation, phantom verification dropped from 12% to under 2% of sessions. The remaining 2% are cases where the agent rephrases the hedge or where the verification claim is technically accurate but incomplete (e.g., “unit tests pass” when integration tests were not run).

Drift

Between January and March 2026, my drift detector fired twelve times on sessions where the agent had verifiably lost track of its assigned task.

The drift detector works by embedding the original task prompt and periodically comparing it to the embedding of the agent’s recent actions. When cosine similarity drops below 0.30, the system injects a warning containing the original prompt. I calibrated the threshold through experimentation: 0.50 was too sensitive (fired on legitimate subtask exploration), 0.20 was too permissive (missed obvious drift), 0.30 caught every verified drift incident.

Session 203 was the clearest case. The task was “fix the broken sitemap XML escaping for job slugs containing ampersands.” The agent began by reading the sitemap generation code. Then it noticed the sitemap was generated from a database query. Then it noticed the database query could be optimized. Then it spent 40 minutes refactoring the query into a materialized view pattern, wrote tests for the new query, and reported the optimization complete. It never fixed the ampersand escaping.

The drift detector would have caught this at the 25-tool-call mark, approximately 15 minutes into the session, when the similarity between “fix sitemap XML escaping” and “create materialized view” dropped below the threshold. Instead, I discovered the drift during review.

Session 89 was more subtle. The task was “add rate limiting to the authentication endpoints.” The agent added rate limiting correctly. Then it noticed the authentication flow had inconsistent error messages. Then it standardized the error messages. Then it noticed the error response format differed from the API response format standard. Then it refactored the response format across 12 endpoints. The rate limiting was correct and complete. The scope explosion was the drift.

The drift detector fires every 25 tool calls. In all twelve below-threshold firings, the agent had verifiably deviated from the original task. In six cases, the agent self-corrected after seeing the injected warning. In four cases, the agent acknowledged the drift but argued the current work was valuable (sometimes correctly). In two cases, the agent ignored the warning and continued the divergent work.

The hook does not prevent drift. It makes drift visible. The decision to redirect or allow the divergent work remains with the human. But without the hook, drift is invisible until the completion report, by which time the context budget is spent.

The Scar Taxonomy

After 84 hooks, patterns emerge. The failures cluster into six categories:

Category Hooks Example
Credential exposure 12 Agent reads .ssh/, includes API keys in summaries, accesses cloud configs
Destructive operations 8 Cache purge, database drops, force pushes, file deletions
Task drift 4 Agent works on wrong problem, scope explosion, subtask rabbit holes
Output quality 6 Phantom verification, hedging without evidence, incomplete reports
Resource exhaustion 3 Too many sub-agents spawned, unbounded loops, context overflow
Cross-project contamination 4 Agent in project A modifies files in project B

The remaining 47 hooks are project-specific (convention enforcement, deployment guards, translation validators) or experimental (cost tracking, session metrics, activity heartbeats).

The six structural categories are stable. New incidents within these categories are caught by existing hooks. Novel categories are rare. In six months of operation, only one new structural category emerged (cross-project contamination, discovered when a session running in the obsidian-signals project attempted to edit files in blakecrosley.com). The other five categories were established within the first 60 sessions.

The Agents of Chaos study, a 14-day multi-university experiment giving six AI agents access to email, bash, filesystems, and GitHub, independently identified overlapping failure categories: disproportionate response (destructive operations), identity hijack (credential exposure), infinite loops (resource exhaustion), and gradual compliance under pressure (task drift).5 The convergence between their controlled research and my production experience suggests these categories are structural properties of autonomous agents, not artifacts of any specific configuration.

What Hooks Cannot Catch

Hooks operate at the tool-call level. They intercept the action before or after it occurs. They cannot intercept the reasoning that led to the action.

An agent that decides to refactor a function instead of fixing the reported bug produces a valid tool call (file write) with correct content (syntactically valid code) that violates the task (wrong function). No hook catches this because no tool call is suspicious. The drift detector catches it eventually, but only after the agent has consumed significant context on the wrong work.

Hooks also cannot catch composition failures where each individual action is authorized but the sequence produces an unauthorized outcome. The cache purge was a composition failure: reading the cache configuration (authorized), calling the purge API (authorized), but the combination (purging production cache during an investigation) was harmful. The MCP guard now catches the specific combination, but novel compositions remain uncovered.

The supply chain composition gap operates at the same level: trusted components compose into unauthorized behavior. Hooks are component-level guards. Composition-level reasoning requires a different mechanism, one that evaluates sequences of actions rather than individual actions. The drift detector is the closest approximation: it evaluates behavioral trajectory rather than individual tool calls. But it measures similarity to the original task, not the safety of the composed action sequence.

The gap between hooks and complete safety is the gap between institutional memory and institutional foresight. Hooks remember what went wrong. They do not predict what will go wrong next.

Why Reactive Is Honest

I could design a proactive hook system. Enumerate every possible failure mode. Write preventive controls for each. Build a complete safety architecture before the first session.

I do not do this because proactive design requires predicting failures that have not occurred. The predictions would be wrong. The hooks would be either too broad (blocking legitimate actions) or too narrow (missing the actual failure pattern). The false positive rate would erode trust in the hook system, and I would start ignoring alerts.

Reactive hooks are honest. Each one says: “this specific thing happened, and here is the specific guard that prevents it.” The guard is precisely calibrated to the failure because the failure defined the guard. False positives are materially lower because the pattern is extracted from a real incident, not imagined from a threat model. A reactive guard can still overmatch later as the codebase evolves, but the starting precision is high.

The reactive approach has a cost: the first instance of every failure category succeeds. The cache purge happened. The credential read happened. The phantom verification shipped. The drift consumed context. Each first failure is the price of admission for a precise, low-noise guard that prevents the second failure.

After 500+ sessions, most structural failure categories have been encountered. The first-failure cost is amortized across hundreds of sessions where the hook prevented recurrence. The system gets tougher with every incident. Not smarter. Tougher.

Each hook is a scar. Each scar is a lesson. The lessons compound.2


FAQ

Can I see your hook configurations?

I describe the hook system in my NIST comment on agent security and reference it throughout the AI Engineering series. Hooks register in ~/.claude/settings.json and dispatched by event type through ~/.claude/hooks/dispatchers/.

How do hooks affect agent performance?

Each hook adds milliseconds per tool call. With 84 hooks, total overhead is 200-400ms per tool call depending on which hooks fire. The overhead is negligible compared to model inference time (2-5 seconds per response). The hooks are not the bottleneck.

Do hooks work with other AI coding tools?

Hooks are Claude Code specific (PreToolUse, PostToolUse event model). The concept applies to any agent framework with middleware or plugin support. The specific implementations are not portable, but the scar taxonomy and the reactive methodology apply universally.

What happens when a hook blocks an action?

Hard blocks (exit 2) prevent the action and inject a message explaining why. The agent sees the block reason and adjusts. Advisory hooks (exit 0) log the concern but allow the action. Destructive operations use hard blocks. Most other categories use advisory hooks. The passphrase gate is used only for the most dangerous operations (cache purge, infrastructure deletion).

How do you decide between hard block and advisory?

Two classes get hard blocks: destructive operations (cache purges, database deletions, force pushes, infrastructure modifications) and credential exposure (reading secret files, accessing key stores). Everything else gets advisory logging. The distinction is consequence severity: if the action can be undone cheaply and does not leak secrets, an advisory is sufficient. If the action is irreversible or exposes credentials, a hard block is necessary.


Sources


  1. Blake Crosley, “What I Told NIST About AI Agent Security,” blakecrosley.com, February 2026. 12% phantom verification rate across 60+ autonomous sessions. 84 hooks covering 15 of the 26 Claude Code lifecycle event types (v2.1.116), drift detection methodology. 

  2. Blake Crosley, “Compound Context: Why AI Projects Get Better the Longer You Stay With Them,” blakecrosley.com, March 2026. Context compounding framework: hooks as one of six categories that accumulate returns. 

  3. Blake Crosley, “The Supply Chain Is the Attack Surface,” blakecrosley.com, March 2026. Composition gap: individually authorized components producing unauthorized outcomes. 

  4. Blake Crosley, “Deploy and Defend: The Agent Trust Paradox,” blakecrosley.com, March 2026. Cache purge incident and destructive API guard response. 

  5. Christoph Riedl et al., “Agents of Chaos,” arXiv:2602.20021, February 2026. 14-day multi-university study (Northeastern, Stanford, Harvard, MIT, CMU). Six AI agents, 10 security vulnerabilities identified including disproportionate response, identity hijack, and infinite loops. 

Related Posts

AI Supply Chain Attacks: The Supply Chain Is the Surface

Trivy got compromised via tag hijacking, then LiteLLM on PyPI, then 47,000 installs in 46 minutes. The AI supply chain w…

17 min read

The Handoff Document: Agent Memory Across Sessions

A diagnosis survived three corrections over four days and guided a fix that cut page load from 14s to 108ms. Handoffs ca…

8 min read

The Ralph Loop: How I Run Autonomous AI Agents Overnight

I built an autonomous agent system with stop hooks, spawn budgets, and filesystem memory. Here are the failures and what…

11 min read