← All Posts

Compounding Engineering: How My Codebase Accelerates Instead of Decaying

Most codebases slow down as they grow. Mine accelerates. After building 95 hooks, 44 skills, and 14 configuration files in my .claude/ infrastructure, each new feature costs less than the previous one because the infrastructure handles more of the work.1

TL;DR

Compounding engineering describes codebases where each feature addition makes subsequent features cheaper to build. I’ve experienced this firsthand: my Claude Code hook system started as 3 hooks and grew to 95. The first hook took an hour to build. Recent hooks take 10 minutes because the infrastructure (lifecycle events, config loading, state management, test harness) already exists. The opposite pattern, entropy engineering, describes codebases where each feature increases the cost of subsequent features. The difference between a team that ships faster in year three than year one and a team that grinds to a halt is whether their engineering decisions compound positively or negatively.


Compounding in Practice: My .claude/ Infrastructure

The Growth Curve

Month Hooks Skills Configs Tests New Hook Time
Month 1 3 2 1 0 60 min
Month 3 25 12 5 20 30 min
Month 6 60 28 10 80 15 min
Month 9 95 44 14 141 10 min

The first hook (git-safety-guardian.sh) required building the entire hook lifecycle: understanding PreToolUse events, writing bash that parses JSON input, handling error cases, testing manually. The 95th hook inherited all of that infrastructure. The time per hook dropped 6x not because the hooks got simpler, but because the infrastructure handled more of the work.

What Compounds

Pattern consistency. Every hook follows the same structure: read JSON input, parse with jq, check conditions, output decision JSON. A developer (or AI agent) reading any hook instantly recognizes the pattern. My 12-module blog linter follows the same consistency principle: each module exports the same interface (check(content, meta) -> findings), making new modules trivial to add.

Config-driven behavior. All 14 JSON config files encode thresholds and rules that were originally hardcoded. When I moved the deliberation consensus threshold from a hardcoded 0.70 in Python to deliberation-config.json, I gained the ability to tune it per task type (security=85%, documentation=50%) without code changes. The same pattern drives my signal scoring pipeline, where tunable weights and thresholds route 7,700+ knowledge items deterministically.2

Test infrastructure. The first 20 hooks had no tests. Adding the test harness (48 bash integration tests, 81 Python unit tests) cost two weeks. Every hook since then ships with tests in under 5 minutes because the fixtures, assertion helpers, and test runners already exist.

Memory system. My MEMORY.md file captures errors, decisions, and patterns across sessions. At 54 entries, it prevents me from repeating mistakes. The ((VAR++)) bash gotcha from hook #23 has prevented the same bug in hooks #24 through #95. Each entry compounds across every future session.3


The Compounding Model

Positive Compounding

Engineering productivity follows a compound interest formula:

Productivity(n) = Base × (1 + r)^n

Where r is the per-feature productivity change rate and n is the number of features shipped.

Positive r (compounding): Each feature makes the next 2-5% cheaper. After 50 features: 1.03^50 = 4.38x productivity improvement.

Negative r (entropy): Each feature makes the next 2-5% more expensive. After 50 features: 0.97^50 = 0.22x productivity, a 78% degradation.

The difference between these trajectories is a 20x gap in engineering velocity after 50 features.4

My Real Numbers

My blakecrosley.com site started as a single FastAPI route with an HTML template. Nine months later:

Feature Build Time Infrastructure Used
First blog post rendering 4 hours None (built from scratch)
Blog listing with categories 2 hours Existing Jinja2 templates, content.py
i18n translation system 6 hours Existing content pipeline, D1 database
Blog search modal 45 min Existing HTMX patterns, Alpine.js state
Blog quality linter (12 modules) 3 hours Existing test infrastructure, CI pipeline
New linter module (URL health) 15 min Existing module interface, test fixtures

The last entry is the compounding payoff: adding a new linter module takes 15 minutes because the module interface, CLI integration, test harness, and CI pipeline already exist. The first module took 3 hours because none of that infrastructure existed.5


Entropy Examples From My Own Codebase

Compounding is not automatic. I’ve also experienced entropy:

The ContentMeta Schema Shortcut

I defined the blog post ContentMeta dataclass in a single session: title, slug, date, description, tags, author, published. I didn’t include category, series, hero_image, scripts, or styles. Each addition later required modifying the parser, updating every template that consumed the metadata, and re-testing the full pipeline. Five additions over three months cost more total time than designing the schema carefully upfront would have. This is the decision timing problem: irreversible decisions deserve upfront analysis.

The i18n Cache Key Collision

A quick implementation of translation caching used blog slugs as cache keys. When two translations of the same slug existed in different locales, the cache returned the wrong language. Debugging took 3 hours. The fix took 15 minutes (add locale prefix to cache key). The shortcut that saved 5 minutes during implementation cost 3 hours in debugging and an architectural review of every cache key in the system.6

The 3.2GB Debug Directory

Hook debug output accumulated in ~/.claude/debug/ without cleanup. Over three months, the directory grew to 3.2GB. The context audit skill I built later caught this and cleaned files older than 7 days, but the cleanup infrastructure should have been built with the first debug output.


Practices That Compound

Consistent Patterns Over Optimal Patterns

A team that uses the same “good enough” pattern across 50 features operates faster than a team that uses the “optimal” pattern for each individual feature. Consistency reduces cognitive load, enables automated tooling, and makes code reviews faster.7

My hook system uses the same bash pattern for all 95 hooks even though some hooks would be more naturally expressed in Python. The consistency means any hook is readable by anyone (or any AI agent) who has read one hook. The suboptimal language choice is more than offset by the zero-learning-curve for new hooks.

Infrastructure as the First Feature

I built my CI/CD pipeline, test harness, and deployment workflow before building any product features on blakecrosley.com. The investment felt slow at the time. Every feature since then has deployed in under 2 minutes with automated testing.8

Phase Infrastructure Investment Payoff Timeline
Week 1-2 FastAPI + Jinja2 + deployment pipeline Paid off by post 3
Week 3-4 Content pipeline + markdown parsing Paid off by post 5
Month 2 Hook lifecycle + git safety Paid off by hook 10
Month 3 Test infrastructure (pytest, bash tests) Paid off by module 5

The Mind Palace Pattern

My .claude/ directory functions as a “mind palace” — a structured set of documents optimized for both human and AI consumption:

~/.claude/
├── configs/     # 14 JSON files — system logic, not hardcoded
├── hooks/       # 95 bash scripts — lifecycle event handlers
├── skills/      # 44 directories — reusable knowledge modules
├── docs/        # 40+ markdown files — system documentation
├── state/       # Runtime tracking — recursion depth, agent lineage
├── handoffs/    # 49 documents — multi-session context preservation
└── memory/      # MEMORY.md — 54 cross-domain error/pattern entries

The mind palace compounds because every new entry enriches the context available to future development sessions. After 54 MEMORY.md entries, the AI agent avoids mistakes I’ve already solved. After 95 hooks, new hooks write themselves by following established patterns. The richer context produces better-fitting AI-generated code, which makes the next feature cheaper.9


Compounding in the AI Era

AI Amplifies Both Directions

AI coding assistants accelerate whatever pattern the codebase already follows. My 95 hooks with consistent patterns produce excellent AI-generated hooks because the AI matches the established structure. A codebase with 5 different hook styles would produce worse AI-generated code because the AI has no consistent pattern to match.10

The compounding effect doubles: consistent patterns make human development faster (cognitive load reduction) AND AI-assisted development faster (pattern matching). Inconsistent patterns make both slower.

Agent-Readable Codebases

I designed my .claude/ infrastructure for AI agent consumption:

  • Structured configs (JSON, not hardcoded values) that agents parse programmatically
  • Consistent naming conventions (verb-noun.sh for hooks, SKILL.md for skill definitions)
  • Machine-verifiable quality checks (141 tests that agents run autonomously) — the metacognitive layer adds self-monitoring on top
  • Explicit documentation (MEMORY.md, handoffs, docs/) that agents read at session start

Each investment in agent-readability compounds as AI tools become more capable.11


Key Takeaways

For engineers: - Track your “time per feature” as the codebase grows; if it increases, you have entropy, if it decreases, you have compounding - Apply the rule of three before extracting abstractions: build the specific solution twice, then extract the reusable pattern on the third occurrence - Invest 15-20% of each sprint in infrastructure and tooling improvements; the compound returns exceed the short-term feature velocity cost within 3-5 sprints

For engineering managers: - Measure engineering health by lead time per feature over time; increasing lead time signals entropy - Treat documentation and testing infrastructure as features, not overhead; my test infrastructure investment (2 weeks) has saved 50+ hours across 95 hooks


References


  1. Author’s .claude/ infrastructure metrics: 95 hooks, 44 skills, 14 configs, 141 tests. New hook implementation time decreased from 60 min to 10 min over 9 months. 

  2. Author’s deliberation config. Task-adaptive consensus thresholds: security=85%, features=80%, refactoring=65%, docs=50%. 

  3. Author’s MEMORY.md. 54 documented errors with cross-domain learning patterns across bash, Python, CSS, and HTML validation. 

  4. Forsgren, Nicole et al., Accelerate, IT Revolution Press, 2018. Engineering velocity measurement and compounding. 

  5. Author’s site development timeline. Feature build times tracked across 9 months of development. 

  6. Author’s debugging experience. i18n cache key collision documented in MEMORY.md error entries. 

  7. Shipper, Dan, “Compounding Engineering,” Every, 2024. Consistency as a compounding force. 

  8. Humble, Jez & Farley, David, Continuous Delivery, Addison-Wesley, 2010. 

  9. Author’s .claude/ mind palace structure. 95 hooks + 44 skills + 14 configs + 54 MEMORY.md entries = compounding context for AI agent development. 

  10. Anthropic, “Best Practices for Claude Code,” 2025. 

  11. Author’s observation on agent-readable codebase patterns. Consistent naming, JSON configs, and machine-verifiable tests improve AI code generation quality. 

Related Posts

The Handoff Document

A diagnosis that survived three code review corrections, two priority reorderings, and guided the correct implementation…

7 min read

Quality Is the Only Variable

Time, cost, resources, and effort are not constraints. The question is what's right, not what's efficient. A philosophy …

7 min read

The Ralph Loop: How I Run Autonomous AI Agents Overnight

I built an autonomous agent system with stop hooks, spawn budgets, and filesystem memory. Here are the failures and what…

8 min read