Context Window Management: What 50 Sessions Taught Me About AI Development
I measured token consumption across 50 Claude Code development sessions. The pattern was consistent: output quality degrades at roughly 60% context utilization, long before the hard limit triggers compaction.1
TL;DR
Context window exhaustion degrades AI coding quality silently. After tracking 50 sessions building my Claude Code infrastructure and blog quality system, I found three patterns that maintain output quality across multi-hour sessions: proactive compaction after each subtask, filesystem-based memory that persists across context boundaries, and subagent delegation that keeps the main context lean. The key insight: treat the context window as a scarce resource, not a bottomless conversation thread.
What Degradation Actually Looks Like
Every file read, tool output, and conversation turn consumes tokens. A single large file read (2,000 lines) can consume 15,000-20,000 tokens. After reading 10 files and running several commands, the context window holds more tool output than actual instructions.2
The degradation is subtle. Claude does not announce “my context is 80% full.” Instead, the model starts: - Forgetting earlier instructions established 20 minutes ago - Repeating suggestions already rejected three turns prior - Missing patterns established earlier in the conversation - Producing less coherent multi-file changes
I noticed this pattern while building my deliberation system. A session that started with precise multi-file edits across 8 Python modules degraded into single-file tunnel vision by the 90-minute mark. The agent stopped referencing the architecture it had read earlier because that context had been compressed away.
Strategy 1: Proactive Compaction
Claude Code’s /compact command summarizes the conversation and frees context space. The system preserves key decisions, file contents, and task state while discarding verbose tool output.3
When to compact: - After completing a distinct subtask (feature implemented, bug fixed) - Before starting a new area of the codebase - When Claude starts repeating or forgetting earlier context
I compact roughly every 25-30 minutes during intensive sessions. During the deliberation infrastructure build (9 PRDs, 3,455 lines of Python), I compacted after each PRD was complete. Each compaction preserved the architectural decisions while freeing context for the next implementation phase.
Strategy 2: Filesystem as Memory
The most reliable memory across context boundaries lives in the filesystem. Claude Code reads CLAUDE.md and memory files at the start of every session and after every compaction.4
My .claude/ directory serves as a structured mind palace:
~/.claude/
├── configs/ # 14 JSON configs (thresholds, rules, budgets)
│ ├── deliberation-config.json
│ ├── recursion-limits.json
│ └── consensus-profiles.json
├── hooks/ # 95 lifecycle event handlers
├── skills/ # 44 reusable knowledge modules
├── state/ # Runtime state (recursion depth, agent lineage)
├── handoffs/ # 49 multi-session context documents
├── docs/ # 40+ system documentation files
└── projects/ # Per-project memory directories
└── {project}/memory/
└── MEMORY.md # Always loaded into context
The MEMORY.md file captures errors, decisions, and patterns across sessions. Currently it holds 54 documented failures with cross-domain learning patterns. When I discover that ((VAR++)) fails with set -e in bash when VAR is 0, I record it. Three sessions later, when I encounter a similar integer edge case in Python, the MEMORY.md entry surfaces the pattern.5
The cross-domain compound effect: A bash escaping error from hook development informed a regex improvement in my Python blog linter. A CSS token gap (--spacing-2xs doesn’t exist) triggered a systematic audit of all custom property references. Each entry connects domains that would otherwise stay siloed within individual session contexts.
Strategy 3: Session Handoff
For tasks spanning multiple sessions, I create handoff documents that capture the full state:
## Handoff: Deliberation Infrastructure PRD-7
**Status:** Hook wiring complete, 81 Python unit tests passing
**Files changed:** hooks/post-deliberation.sh, hooks/deliberation-pride-check.sh
**Decision:** Placed post-deliberation in PostToolUse:Task, pride-check in Stop
**Blocked:** Spawn budget model needs inheritance instead of depth increment
**Next:** PRD-8 integration tests in tests/test_deliberation_lib.py
My ~/.claude/handoffs/ directory holds 49 handoff documents from multi-session tasks. Starting a new session with claude -c (continue) or reading the handoff document provides the successor session with full context at minimal token cost.6
The handoff pattern saved me during the deliberation build. PRD-4 (recursion-guard extensions) required understanding decisions from PRDs 1-3. Without the handoff, the new session would have needed to re-read all modified files. With the handoff, the session started with the architectural context and went straight to implementation.
Strategy 4: Subagent Delegation
Subagents run in independent context windows. Delegating research or review tasks to subagents preserves the main session’s context for implementation work.7
My recursion-guard system manages this automatically:
# From recursion-guard.sh - spawn budget enforcement
MAX_DEPTH=2
MAX_CHILDREN=5
DELIB_SPAWN_BUDGET=2
DELIB_MAX_AGENTS=12
Each subagent returns a summary rather than raw output, keeping the main context lean. During blog post rewrites, I delegate exploration tasks (gathering CSS data, reading hook code, surveying directory structures) to subagents. The main context stays focused on writing while subagents handle the research.
The spawn budget learned the hard way: an early session without limits spawned recursive subagents that each spawned more subagents. The recursion-guard hook now enforces depth limits with safe integer validation and config-driven budgets.8
The Anti-Patterns I Learned From
Reading entire files when you need 10 lines. Early in my Claude Code usage, I read entire 2,000-line files for a single function. Use line offsets: Read file.py offset=100 limit=20 saves 15,000+ tokens per read.
Keeping verbose error output in context. After debugging the spawn budget issue, my context held 40+ stack traces from failed iterations. A single /compact after fixing the bug freed that dead weight.
Starting every session by reading every file. My first sessions pre-loaded 8-10 files “for context.” Now I let Claude Code’s glob and grep tools find relevant files on demand, saving 100,000+ tokens of unnecessary pre-loading.
Key Takeaways
For individual developers: - Compact after every completed subtask, not when Claude forces compaction; proactive compaction at 25-30 minute intervals maintains output quality - Write key decisions to filesystem memory files as the session progresses; my MEMORY.md has 54 entries that persist across hundreds of sessions - Use subagents for research tasks that would pollute the main context; a 5-file research query costs 75,000+ tokens in the main context but only a 500-token summary via subagent
For teams: - Standardize handoff document format for multi-session tasks; my 49 handoffs each follow the same Status/Files/Decision/Blocked/Next structure - Configure project-level CLAUDE.md files with architecture context that loads into every session automatically — see Context Is Architecture for the 7-layer hierarchy that evolved from these token management principles
References
-
Author’s measurement of token consumption across 50 Claude Code development sessions (2025-2026). Output quality degradation observed consistently at ~60% context utilization. ↩
-
Author’s measurement: average file read consumes 8,000-20,000 tokens depending on file size. 10 file reads plus tool outputs consume 40-60% of a 200K context window. ↩
-
Anthropic, “Claude Code Documentation,” 2025. Context compaction and the /compact command. ↩
-
Anthropic, “Claude Code Documentation,” 2025. Memory files and CLAUDE.md documentation. ↩
-
Author’s
.claude/projects/*/memory/MEMORY.mdfiles. 54 documented errors with cross-domain learning patterns across bash, Python, CSS, and HTML validation. ↩ -
Author’s session management workflow. 49 handoff documents in
~/.claude/handoffs/spanning multi-session infrastructure builds. ↩ -
Anthropic, “Claude Code Documentation,” 2025. Subagent context isolation. ↩
-
Author’s recursion-guard.sh implementation. Spawn budget model with depth limits, safe integer validation, and config-driven budgets. ↩