Context Window Management: 50 Sessions of Data
I measured token consumption across 50 Claude Code development sessions. The pattern was consistent: output quality degrades at roughly 60% context utilization, long before the hard limit triggers compaction.1
Effective LLM context window management requires three practices: proactive compaction after each subtask rather than waiting for the hard limit, filesystem-based memory that persists decisions across context boundaries, and subagent delegation that keeps the primary context lean. Output quality degrades silently at roughly 60 percent utilization as the model begins forgetting earlier instructions and repeating rejected suggestions. Treating the context window as a scarce resource rather than a bottomless conversation thread is the key shift.
TL;DR
Context window exhaustion degrades AI coding quality silently. After tracking 50 sessions building my Claude Code infrastructure and blog quality system, I found three patterns that maintain output quality across multi-hour sessions: proactive compaction after each subtask, filesystem-based memory that persists across context boundaries, and subagent delegation that keeps the main context lean. The key insight: treat the context window as a scarce resource, not a bottomless conversation thread.
What Degradation Actually Looks Like
Every file read, tool output, and conversation turn consumes tokens. A single large file read (2,000 lines) can consume 15,000-20,000 tokens. After reading 10 files and running several commands, the context window holds more tool output than actual instructions.2
The degradation is subtle. Claude does not announce “my context is 80% full.” Instead, the model starts: - Forgetting earlier instructions established 20 minutes ago - Repeating suggestions already rejected three turns prior - Missing patterns established earlier in the conversation - Producing less coherent multi-file changes
I noticed this pattern while building my deliberation system. A session that started with precise multi-file edits across 8 Python modules degraded into single-file tunnel vision by the 90-minute mark. The agent stopped referencing the architecture it had read earlier because that context had been compressed away.
Strategy 1: Proactive Compaction
Claude Code’s /compact command summarizes the conversation and frees context space. The system preserves key decisions, file contents, and task state while discarding verbose tool output.3
When to compact: - After completing a distinct subtask (feature implemented, bug fixed) - Before starting a new area of the codebase - When Claude starts repeating or forgetting earlier context
I compact roughly every 25-30 minutes during intensive sessions. During the deliberation infrastructure build (9 PRDs, 3,455 lines of Python), I compacted after each PRD was complete. Each compaction preserved the architectural decisions while freeing context for the next implementation phase.
Strategy 2: Filesystem as Memory
The most reliable memory across context boundaries lives in the filesystem. Claude Code reads CLAUDE.md and memory files at the start of every session and after every compaction.4
My .claude/ directory serves as a structured mind palace:
~/.claude/
├── configs/ # 14 JSON configs (thresholds, rules, budgets)
│ ├── deliberation-config.json
│ ├── recursion-limits.json
│ └── consensus-profiles.json
├── hooks/ # 95 lifecycle event handlers
├── skills/ # 44 reusable knowledge modules
├── state/ # Runtime state (recursion depth, agent lineage)
├── handoffs/ # 49 multi-session context documents
├── docs/ # 40+ system documentation files
└── projects/ # Per-project memory directories
└── {project}/memory/
└── MEMORY.md # Always loaded into context
The MEMORY.md file captures errors, decisions, and patterns across sessions. Currently it holds 54 documented failures with cross-domain learning patterns. When I discover that ((VAR++)) fails with set -e in bash when VAR is 0, I record it. Three sessions later, when I encounter a similar integer edge case in Python, the MEMORY.md entry surfaces the pattern.5
The cross-domain compound effect: A bash escaping error from hook development informed a regex improvement in my Python blog linter. A CSS token gap (--spacing-2xs doesn’t exist) triggered a systematic audit of all custom property references. Each entry connects domains that would otherwise stay siloed within individual session contexts.
Strategy 3: Session Handoff
For tasks spanning multiple sessions, I create handoff documents that capture the full state:
## Handoff: Deliberation Infrastructure PRD-7
**Status:** Hook wiring complete, 81 Python unit tests passing
**Files changed:** hooks/post-deliberation.sh, hooks/deliberation-pride-check.sh
**Decision:** Placed post-deliberation in PostToolUse:Task, pride-check in Stop
**Blocked:** Spawn budget model needs inheritance instead of depth increment
**Next:** PRD-8 integration tests in tests/test_deliberation_lib.py
My ~/.claude/handoffs/ directory holds 49 handoff documents from multi-session tasks. Starting a new session with claude -c (continue) or reading the handoff document provides the successor session with full context at minimal token cost.6
The handoff pattern saved me during the deliberation build. PRD-4 (recursion-guard extensions) required understanding decisions from PRDs 1-3. Without the handoff, the new session would have needed to re-read all modified files. With the handoff, the session started with the architectural context and went straight to implementation.
Strategy 4: Subagent Delegation
Subagents run in independent context windows. Delegating research or review tasks to subagents preserves the main session’s context for implementation work.7
My recursion-guard system manages this automatically:
# From recursion-guard.sh - spawn budget enforcement
MAX_DEPTH=2
MAX_CHILDREN=5
DELIB_SPAWN_BUDGET=2
DELIB_MAX_AGENTS=12
Each subagent returns a summary rather than raw output, keeping the main context lean. During blog post rewrites, I delegate exploration tasks (gathering CSS data, reading hook code, surveying directory structures) to subagents. The main context stays focused on writing while subagents handle the research.
The spawn budget learned the hard way: an early session without limits spawned recursive subagents that each spawned more subagents. The recursion-guard hook now enforces depth limits with safe integer validation and config-driven budgets.8
The Anti-Patterns I Learned From
Reading entire files when you need 10 lines. Early in my Claude Code usage, I read entire 2,000-line files for a single function. Use line offsets: Read file.py offset=100 limit=20 saves 15,000+ tokens per read.
Keeping verbose error output in context. After debugging the spawn budget issue, my context held 40+ stack traces from failed iterations. A single /compact after fixing the bug freed that dead weight.
Starting every session by reading every file. My first sessions pre-loaded 8-10 files “for context.” Now I let Claude Code’s glob and grep tools find relevant files on demand, saving 100,000+ tokens of unnecessary pre-loading.
Key Takeaways
For individual developers: - Compact after every completed subtask, not when Claude forces compaction; proactive compaction at 25-30 minute intervals maintains output quality - Write key decisions to filesystem memory files as the session progresses; my MEMORY.md has 54 entries that persist across hundreds of sessions - Use subagents for research tasks that would pollute the main context; a 5-file research query costs 75,000+ tokens in the main context but only a 500-token summary via subagent
For teams: - Standardize handoff document format for multi-session tasks; my 49 handoffs each follow the same Status/Files/Decision/Blocked/Next structure - Configure project-level CLAUDE.md files with architecture context that loads into every session automatically — see Context Is Architecture for the 7-layer hierarchy that evolved from these token management principles
FAQ
What happens when an LLM context window fills up?
When the context window approaches capacity (roughly 95% utilization), Claude Code triggers automatic compaction — a summarization process that condenses conversation history to free space. However, output quality degrades well before compaction triggers. In my measurements across 50 sessions, quality degradation begins at approximately 60% context utilization: the model starts forgetting earlier instructions, repeating rejected suggestions, and producing less coherent multi-file changes. Each compaction event discards context that may have contained important decisions, so fewer compactions mean more coherent sessions.
How do you manage LLM context effectively during long sessions?
Four strategies maintain output quality across multi-hour sessions. Proactive compaction after each completed subtask (every 25-30 minutes) prevents silent degradation. Filesystem-based memory (CLAUDE.md, MEMORY.md, handoff documents) persists critical state across context boundaries. Subagent delegation offloads research tasks to independent context windows, returning summaries instead of raw output. And targeted file reads with line offsets (Read file.py offset=100 limit=20) save 15,000+ tokens per read compared to loading entire files. See Context Is the New Memory for the three-layer compression framework.
What is context compaction in Claude Code?
Context compaction is Claude Code’s built-in mechanism for summarizing conversation history when the context window approaches capacity. The /compact command triggers it manually, and auto-compact activates at approximately 95% utilization. Compaction preserves key decisions, file contents, and task state while discarding verbose tool output. A circuit breaker halts after 3 consecutive compaction failures to prevent infinite retry loops — before this safeguard existed, 1,279 sessions had 50+ consecutive failures, wasting approximately 250K API calls per day as documented in the source leak analysis.
How many tokens is enough for productive AI development?
A 200K token context window is sufficient when properly managed. My compressed setup maintains approximately 78% of the window (156,000 tokens) available for reasoning, enabling 40-60 productive turns before hitting the compaction threshold. An uncompressed setup leaves only 26% (53,000 tokens) for reasoning, hitting compaction after 15-20 turns. The useful size of a window depends more on what fills it than how large it is — a well-compressed 200K window outperforms a hypothetical 500K window stuffed with uncompressed tool output. System prompt compression, CLI-first tool architecture, and fresh agent spawns are the highest-leverage optimizations.
How do handoff documents help with multi-session AI development?
Handoff documents capture the full state of a task for the next session: current status, files changed, key decisions made, blockers, and next steps. Starting a new session by reading a handoff document provides full architectural context at minimal token cost (typically under 500 tokens) instead of requiring the agent to re-read all modified files. My 49 handoff documents follow a consistent Status/Files/Decision/Blocked/Next structure. During the deliberation system build, handoffs let each PRD session start with prior architectural decisions loaded without carrying conversation history from earlier PRDs.
References
-
Author’s measurement of token consumption across 50 sessions (2025-2026). Output quality degradation observed consistently at ~60% context utilization. ↩
-
Author’s measurement: average file read consumes 8,000-20,000 tokens depending on file size. 10 file reads plus tool outputs consume 40-60% of a 200K context window. ↩
-
Anthropic, “Claude Code Documentation,” 2025. Context compaction and the /compact command. ↩
-
Anthropic, “Claude Code Documentation,” 2025. Memory files and CLAUDE.md documentation. ↩
-
Author’s
.claude/projects/*/memory/MEMORY.mdfiles. 54 documented errors with cross-domain learning patterns across bash, Python, CSS, and HTML validation. ↩ -
Author’s session management workflow. 49 handoff documents in
~/.claude/handoffs/spanning multi-session infrastructure builds. ↩ -
Anthropic, “Claude Code Documentation,” 2025. Subagent context isolation. ↩
-
Author’s recursion-guard.sh implementation. Spawn budget model with depth limits, safe integer validation, and config-driven budgets. ↩