Claude Code as Infrastructure
Andrej Karpathy coined a term for what grows around an LLM agent: claws. The hooks, scripts, and orchestration that let the agent grip the world outside its context window.1 Most people treat Claude Code like a chat box with file access. They type a prompt, watch it edit a file, and move on. That framing misses what the tool actually is.
Claude Code is not an IDE feature. It is infrastructure. And the gap between treating it as one versus the other determines whether AI-assisted development stays at 10% productivity gains or breaks through to something fundamentally different.
Treating Claude Code as infrastructure means configuring it like a programmable runtime — with hooks, skills, and agents — rather than using it as an interactive chat box. Claude Code exposes 17 lifecycle events that accept shell-script hooks firing before, during, or after every tool call. Stacking hooks into dispatchers, skills, and autonomous agents creates a deterministic policy layer the model cannot bypass, enabling systems that write, review, and ship code without human supervision.
TL;DR
Claude Code exposes 17 lifecycle events, each hookable with shell scripts that fire before, during, or after every tool call.2 Stack hooks into dispatchers, dispatchers into skills, skills into agents, agents into workflows, and you get a programmable layer between you and the model that enforces constraints the model cannot skip. I built 84 hooks, 48 skills, 19 agents, and ~15,000 lines of orchestration over two months. Zero frameworks. Zero external dependencies. All bash and JSON. The result is an autonomous development system that writes, reviews, and ships code while I sleep. This post explains the architecture, why the IDE framing holds people back, and what changes now that Remote Control makes this infrastructure accessible from anywhere.
The IDE Framing is Wrong
The default mental model: Claude Code is a smarter autocomplete. You sit at a terminal, give it tasks, and supervise the output. That model caps your productivity at whatever you can personally oversee.
The infrastructure mental model: Claude Code is a programmable runtime with an LLM kernel. Every action the model takes passes through hooks you control. You define policies, not prompts. The model operates within your infrastructure the same way a web server operates within nginx rules. You do not sit at nginx and type requests. You configure it, deploy it, and monitor it.
The distinction matters because infrastructure compounds. A hook that blocks credentials in bash commands protects every session, every agent, every autonomous run. A skill that encodes your blog evaluation rubric applies consistently whether you invoke it or an agent does. An agent that reviews code for security runs the same checks whether you are watching or not.
Simon Willison frames the current moment around a single observation: writing code is cheap now.3 Correct. But the corollary nobody wants to hear is that verification is now the expensive part. Cheap code without verification infrastructure produces bugs at scale. The investment that pays off is not a better prompt. It is the system around the model that catches what the model misses.
The Infrastructure Layer
Claude Code’s hook system fires shell commands at 17 lifecycle events.2 PreToolUse fires before a tool executes and can block it. PostToolUse fires after and can provide feedback. UserPromptSubmit fires when you type and can inject context. Stop fires when the model tries to finish and can force it to continue. Each event receives JSON on stdin with full context: session ID, tool name, tool input, current working directory.
The hook system is not a plugin system. It is an event-driven architecture. The difference: plugins extend a tool’s features. Events let you intercept, modify, and control every action the tool takes. You become the middleware.
Hooks: The Deterministic Layer
Hooks are shell scripts. They cannot be hallucinated, sweet-talked, or prompt-injected around. The model wants to run rm -rf /? A 10-line bash script checks the command against a blocklist and rejects it before the shell ever sees it. The model tries to read .env? A regex on the file path intercepts the Read tool call. None of this requires the model to cooperate. The hook fires whether the model wants it to or not.
I run 84 hooks across 17 event types. The split tells a story: 35 enforce judgment (gates, guards, validators) and 49 handle automation (injectors, loggers, trackers). That ratio started at 1:6. Two months of things breaking in autonomous runs pushed it to 4:5. Every judgment hook exists because something failed without it. An agent committed code with TODO comments. An agent ran a destructive git command. An agent leaked a credential path into a log file. Each failure got a gate.
The biggest lesson: dispatchers over independent hooks. I had seven hooks all firing on UserPromptSubmit, each reading stdin independently, two writing to the same JSON state file. Concurrent writes truncated the JSON. Every downstream hook that parsed that file broke. One dispatcher per event running hooks sequentially from cached stdin fixed it. Invisible overhead, 200ms per prompt.
Skills: The Knowledge Layer
Skills are markdown instruction sets that activate on demand or via hooks.4 Each one encodes domain expertise the model draws on when invoked. My blog-evaluator skill defines a 6-category weighted rubric with specific scoring criteria, category minimums, and interdependencies. My jiro skill encodes a 7-step quality loop with an evidence gate that requires specific proof for each criterion.
Skills compose with hooks. A skill can define its own hooks in frontmatter that activate only while the skill runs. Philosophy skills auto-activate via SessionStart hooks, injecting quality constraints into every session without explicit invocation.
48 skills covering: code quality (jiro, testing-philosophy, debugging-philosophy), content (blog-writer-core, blog-evaluator, citation-verifier), architecture (fastapi, swiftui, database, htmx-alpine), operations (deploy, cache, analytics, security), and meta-orchestration (deliberation, scan-intel, ralph). Research into Claude Code’s own preferences found it gravitates toward certain frameworks and patterns.9 Skills let you override those defaults with your own.
Agents: The Delegation Layer
Agents are specialized subagents with isolated context windows.5 Each one gets a focused task and fresh context. My code review system spawns three agents in parallel: correctness, security, and conventions. Each reviews independently. Disagreements between reviewers surface exactly the issues a single reviewer would miss.
The critical constraint: a recursion guard. A shell script fires before every Task tool call, checks a spawn depth counter in a shared state file, and blocks the call if depth exceeds a threshold. Without it, agents delegate to agents that delegate to agents, each one losing context and burning tokens. Default limit is 3 levels. In practice, useful work happens at depth 1 (main agent plus one subagent). Anything deeper than 2 usually means the task decomposition was wrong.
19 agents spanning: development (ios-developer, backend-architect), review (code-reviewer, security-reviewer, conventions-reviewer, yagni-reviewer), exploration (project-scout, code-explorer, code-architect), and validation (test-runner, correctness-reviewer).
Remote Control Changes the Equation
On February 25, 2026, Anthropic shipped Remote Control: the ability to connect to a local Claude Code session from any browser or the Claude mobile app.6 The feature got 531 points and 313 comments on Hacker News, most of them complaints about bugs. The complaints are valid. The feature is still transformative.
Here is why. Before Remote Control, the infrastructure I described had two modes: supervised (I watch the terminal) or unsupervised (I walk away and hope). Neither is ideal. Supervised caps throughput at my attention span. Unsupervised risks the model making bad decisions nobody catches.
Remote Control creates a third mode: asynchronous governance. I run autonomous loops that process multi-story PRDs overnight. The approval prompts for external actions (git push, API calls, anything that leaves the machine) route to my phone. I approve, reject, or redirect from anywhere. The governance layer stays the same. The latency between “agent needs approval” and “human provides it” drops from “whenever I check my laptop” to “10 seconds from my phone.”
The approval flow compounds with the blast radius classification from my hooks. Local operations (file writes, test runs) auto-approve. Shared operations (git commits) warn. External operations (pushes, API calls, deployments) defer to human review. Remote Control turns that “defer” path from a blocking wait into an async notification. The agent keeps working on the next story while I review the previous one.
Tools like Agent Multiplexer already manage Claude Code sessions via tmux.10 Open-source alternatives like Emdash provide full agentic development environments.11 The people suggesting SSH plus tmux as an alternative are right that it works for terminal access. None of these give you the approval routing. That routing is what makes unattended operation safe, not just possible.
Cost as Architecture
The “Making MCP Cheaper via CLI” post (304 HN points) documented a pattern: wrapping MCP tool calls in CLI invocations to avoid the overhead of maintaining an MCP server connection.7 The broader insight is that cost is an architectural decision, not an operational afterthought.
My infrastructure handles cost at three levels:
Token level. System prompt compression. I run ~3,500 tokens of system prompt across a CLAUDE.md file and 8 rules files. The high-return cuts: removing tutorial code examples (the model knows the APIs), collapsing duplicate rules across files, and replacing explanations with constraints. “Reject tool calls matching sensitive paths” does the same work as a 15-line explanation of why credentials shouldn’t be read. Semantic density over raw compression.8
Agent level. Fresh spawns over long conversations. Each story in an autonomous run gets a new agent with a clean context window. At spawn time, the agent receives a briefing: current git state, what previous agents accomplished, what it needs to do. Briefing instead of memory. Models execute a clear briefing better than they navigate 30 steps of accumulated context. The context never balloons because each agent starts fresh. Geoffrey Huntley documented a similar pattern in “The Ralph Loop,” running autonomous development at $10.42/hour on Sonnet.13 Multi-agent orchestrators like OpenSwarm formalize the worker-reviewer pipeline with model escalation.14
Architecture level. CLI-first over MCP when the operation is stateless. A claude --print call for a one-shot evaluation costs less and adds no connection overhead. An MCP server makes sense when the tool needs persistent state or streaming. Context Mode demonstrated the inverse: compressing 315 KB of MCP output to 5.4 KB using FTS5 indexing with BM25 ranking.12 Both approaches reduce token spend, from different directions. Most of my skill invocations are one-shot. My prompt caching analysis found that the Claude Code CLI caches system prompts by default above 4,096 tokens. Zero configuration needed.
Case Study: What 84 Hooks Look Like in Practice
A concrete session trace from an autonomous run last week, processing a PRD with 5 stories:
-
SessionStartfires. Dispatcher injects: current date, project detection, philosophy constraints, system performance check, cost tracking initialization. Five hooks, 180ms total. -
Agent reads the PRD, plans the first story.
UserPromptSubmitfires on the internal prompt. Dispatcher injects: active project context, session drift baseline (Model2Vec embedding of first prompt for later similarity checks). 120ms. -
Agent calls
Bashto run tests.PreToolUse:Bashfires. Dispatcher runs: credentials check (no.envpaths in the command), sandbox validation (command not on blocklist), project detection. 90ms. Test runs.PostToolUse:Bashfires: activity heartbeat logged, drift check against baseline (cosine similarity 0.63, well above 0.30 threshold). -
Agent calls
Writeto create a file.PreToolUse:Writefires: file scope check (is this path within the project directory?).PostToolUse:Writefires: lint check on the written file, commit tracking, activity heartbeat. -
Agent finishes the story.
Stopfires. Quality gate hook checks: did the agent cite evidence for each criterion? Did it use hedging language (“should”, “probably”)? Are there TODO comments in the diff? If any check fails, the hook returnsexit 2and the agent continues working. -
Independent verification: a fresh agent runs the test suite without trusting the previous agent’s self-report.
-
Three code review agents spawn in parallel. Each reviews the diff independently. Findings merge. If any reviewer flags a CRITICAL issue, the story goes back in the queue.
-
Story passes. Next story loads. The cycle repeats for all 5 stories.
Total hooks fired across 5 stories: ~340. Total time in hooks: ~12 seconds. Invisible overhead that prevented three credential leaks, one destructive command, and two incomplete implementations in a single overnight run.
Key Takeaways
Claude Code is a runtime, not a tool. The 17 lifecycle events make it programmable. Hooks, skills, and agents are the instruction set. The model is the execution engine. You are the systems architect.
Governance scales with automation. Every hook that adds a constraint reduces the risk of unattended operation. The ratio of judgment hooks to automation hooks is your safety margin. Mine is 4:5 and climbing.
Infrastructure compounds, prompts don’t. A good prompt improves one interaction. A good hook improves every interaction. A good skill improves every agent that invokes it. A good agent improves every workflow that delegates to it. Invest in the layer that multiplies.
Remote Control makes the infrastructure portable. The approval routing turns “unsupervised” into “asynchronously supervised.” That distinction is the difference between hoping the model makes good decisions and verifying it does.
Cost is architecture, not optimization. Fresh agent spawns, CLI-first invocations, system prompt compression, and prompt caching are structural decisions that compound. Optimizing after the fact costs more than designing for it.
Zero frameworks required. 84 hooks, 48 skills, 19 agents, ~15,000 lines of orchestration. Bash scripts in a directory. JSON state files. No runtime dependencies. You can adopt one hook or the entire stack. The infrastructure grows organically from solving real problems, not from implementing someone else’s framework.
FAQ
What does it mean to treat Claude Code as infrastructure?
Treating Claude Code as infrastructure means configuring it as a programmable runtime rather than using it as an interactive chat tool. Instead of typing prompts and watching edits, you define policies through hooks (shell scripts that fire at 17 lifecycle events), encode domain expertise in skills (markdown instruction sets), and delegate specialized work to agents (subagents with isolated context windows). The infrastructure compounds — a hook that blocks credential leaks protects every session, every agent, every autonomous run — while individual prompts improve only one interaction.
How do you configure Claude Code for teams?
Teams configure Claude Code at two levels. Project-level configuration lives in .claude/settings.json (hooks and permissions), .claude/skills/ (shared domain expertise), and CLAUDE.md (architecture context loaded into every session). These files are committed to git and distributed automatically when teammates pull. User-level configuration in ~/.claude/ adds personal preferences and safety hooks that apply across all projects. The combination ensures every team member gets standardized safety rails and domain knowledge while retaining personal workflow preferences.
What is CLAUDE.md and why does it matter?
CLAUDE.md is a markdown file in your project root that Claude Code reads at the start of every session and after every context compaction. It contains project-specific instructions, architecture context, coding standards, and constraints that shape how Claude behaves in your codebase. Because CLAUDE.md loads on every API call, every token in it occupies space for the entire conversation — so compression matters. Target constraints over explanations (“Reject tool calls matching sensitive paths” instead of a 15-line rationale) and keep the file under 3,500 tokens for optimal prompt cache performance.
How do hooks, skills, and rules work together in Claude Code?
The three layers serve complementary roles. Hooks provide deterministic enforcement — shell scripts that block dangerous commands, inject context, and gate output quality. They cannot be bypassed through prompting. Skills provide domain expertise — markdown files encoding security patterns, evaluation rubrics, and business rules that Claude applies when relevant. Rules (in .claude/rules/) provide persistent constraints loaded into every session. Together they form a layered system: hooks enforce what the model must and must not do, skills inform how it should approach domain-specific work, and rules set the baseline standards that apply to all interactions.
How much overhead do Claude Code hooks add to each session?
In my setup with 84 hooks across 17 event types, total hook execution adds approximately 12 seconds across a full 5-story autonomous run (~340 hook firings). Per-event overhead is 90-200ms depending on the event type and number of hooks in the dispatcher chain. The overhead is invisible in practice and prevented three credential leaks, one destructive command, and two incomplete implementations in a single overnight run. The key optimization is using one dispatcher per event running hooks sequentially from cached stdin, rather than independent hooks that each read stdin separately.
This is part of the AI Engineering series. Previously: Why My AI Agent Has a Quality Philosophy. See also: Thinking With Ten Brains and The Blind Judge.
-
Andrej Karpathy on “claws” as a new layer on top of LLM agents. HN discussion (406 points, 917 comments). ↩
-
Claude Code Hooks Reference. Anthropic documentation. 17 lifecycle events with JSON input/output, matcher patterns, and three hook types (command, prompt, agent). ↩↩
-
Simon Willison, “Writing code is cheap now.” Agentic Engineering Patterns. HN discussion. ↩
-
Claude Code Skills Reference. Anthropic documentation. Markdown instruction sets with frontmatter metadata, allowed tools, and hook definitions. ↩
-
Claude Code Sub-agents Reference. Anthropic documentation. Specialized subagents with isolated context, worktree support, and model selection. ↩
-
Claude Code Remote Control. Anthropic documentation. Continue local sessions from any device. HN discussion (531 points, 313 comments). ↩
-
“Making MCP Cheaper via CLI.” Blog post by thellimist. HN discussion (304 points, 115 comments). ↩
-
“Compress Your Claude.md: Cut 60-70% of System Prompt Bloat.” Blog post by jchilcher. HN discussion (24 points, 9 comments). ↩
-
“What Claude Code Chooses.” Research by amplifying.ai. Analysis of Claude Code’s tool and framework preferences. HN discussion (39 points, 19 comments). ↩
-
Agent Multiplexer (amux). GitHub. Manage Claude Code sessions via tmux. HN discussion (13 points). ↩
-
Emdash: Open-source agentic development environment. GitHub. HN discussion (201 points, 71 comments). ↩
-
Context Mode: 315 KB of MCP output becomes 5.4 KB. GitHub. FTS5 indexing with BM25 ranking. HN discussion (77 points, 23 comments). ↩
-
Geoffrey Huntley, “The Ralph Loop.” ghuntley.com/loop. Autonomous development at $10.42/hour running Sonnet. ↩
-
OpenSwarm: Multi-Agent Claude CLI Orchestrator. GitHub. Worker-reviewer pipelines with model escalation. HN discussion (34 points, 18 comments). ↩