Agent Sessions Are the Real Commit Messages We Discard

March 02, 2026 18 min read Updated March 10, 2026

ai claude-code agents provenance git engineering autonomous-coding developer-tools

From the guide: Claude Code Comprehensive Guide

A developer inherits a codebase. git blame shows 47 files changed in one commit. The message reads: “refactor auth module.” The commit author is listed as a human developer. The actual author was a coding agent that ran for 90 minutes, read 200 files, evaluated three alternative approaches, rejected two of them for specific reasons, and produced a change set that touches every authentication endpoint. The 90-minute session of design decisions, alternatives rejected, and edge cases discussed is gone. Git preserved what changed. Nothing preserved why.

My cognitive debt post named the gap between agent output speed and developer comprehension speed “cognitive debt” — a liability that compounds with every unreviewed commit.¹ The Memento project, which collected 100 HN points and 124 comments, asks the next question: if the session contains the reasoning, should the session be part of the commit?²

AI sessions should be treated as commit messages because the session transcript contains the design reasoning that git diffs cannot capture. When an agent runs for 90 minutes, reads 200 files, evaluates three approaches, and rejects two for specific reasons, the commit message “refactor auth module” discards 99.9% of the decision context. Attaching session provenance to commits preserves the chain from code change to rationale, making future code review an act of reading rather than archaeology.

TL;DR

Git captures WHAT changed. Agent sessions capture WHY. When agents write code, the session transcript is the real design document, and every workflow that discards the transcript discards the provenance. Memento (an open-source git extension) attaches AI session transcripts to commits as git notes, creating a provenance chain from commit to reasoning. Claude Code’s new LSP integration adds structural code understanding that makes session transcripts more precise: go-to-definition replaces grep, and type signatures replace guesses. Below: the provenance gap, four layers of session metadata, what Memento builds, how LSP changes the quality of session data, and minimum provenance practices you can implement today.

The Provenance Gap

Git tracks five things about every change: who made it, when, what files changed, the diff, and a commit message. For human-authored code, the commit message bridges the gap between the diff and the intent. A good message explains why the change exists. A bad message (“fix stuff”) leaves the reviewer to reconstruct intent from the code.

Agent-authored code has a different provenance structure. The intent does not live in the developer’s head. The intent lives in the session: the prompt that started the task, the files the agent read, the alternatives it evaluated, the tools it called, and the evidence it cited when reporting completion. A commit message summarizing 90 minutes of agent reasoning in one line discards 99.9% of the decision context.

The loss is not theoretical. My orchestration system generates session state files (jiro.state.json, jiro.progress.json) that record every story completion, reviewer verdict, and evidence gate result.³ When a reviewer asks “why did the agent use exponential backoff instead of circuit breaker?” the session state file contains the answer: the agent evaluated both patterns, found that the upstream service returns retryable 503s with a Retry-After header, and selected exponential backoff to honor the header value. The commit message says “refactor: standardize retry patterns.” The session state says why.

Without session provenance, code review of agent-authored changes becomes archaeology. The reviewer reads the diff, reverse-engineers the reasoning, and forms a theory about why the change exists. The theory may be wrong. The agent’s actual reasoning is available, recorded in the session transcript. The industry standard workflow (commit, push, review the diff) throws the reasoning away.

The problem multiplies with agent composition. My orchestration system spawns specialized subagents for code review: a correctness reviewer, a security reviewer, a conventions reviewer.⁵ Each subagent runs its own session, reads its own files, forms its own conclusions. The parent agent aggregates the verdicts. The final commit message says “3 reviewers: approve.” The three individual review sessions — each containing specific findings, edge case analysis, and approval rationale — live in separate transcripts that the commit never references. Every layer of agent delegation adds another layer of invisible reasoning.

The provenance problem connects to three existing failure patterns. The fabrication firewall identified how agents publish unverified claims when no output gate exists.⁶ Session provenance would have caught the fabrication earlier: the session transcript showed the agent inventing a token-counting methodology that no human reviewed. The invisible agent documented how agent actions go unmonitored without explicit instrumentation.⁷ Session provenance is the audit trail that the visibility stack generates. The NIST public comment recommended standardized audit logging for agent actions.⁹ Git notes storing session transcripts are one implementation of that recommendation.

The evidence gate in my quality system requires the agent to cite specific proof for each quality criterion: name the pattern, explain alternatives, list edge cases, paste test output.¹⁰ The evidence gate forces the agent to generate Reasoning and Verification layer data that would otherwise not exist. Without the gate, the agent reports “done” and the session contains only Process data (tool calls). With the gate, the session contains explicit rationale that a reviewer can verify against the code.

Git alone cannot distinguish between a 47-file commit that represents 90 minutes of careful reasoning and a 47-file commit that represents an agent running unconstrained for 90 minutes with no review. The git documentation describes notes as “extra information about an object that can be attached without changing the object itself.”⁸ Session transcripts fit the definition exactly: extra information about a commit’s provenance that does not alter the commit hash, the diff, or the history.

The Memento Question

The Memento project answers the provenance gap with a git extension.² The tool captures AI coding session transcripts and attaches them to commits as git notes, stored in refs/notes/commits and refs/notes/memento-full-audit.

The workflow: git memento init configures the repository. git memento commit <session-id> replaces git commit, automatically retrieving the session transcript from the configured AI provider (Codex or Claude Code) and storing it as structured metadata on the commit.

The 124-comment HN discussion surfaced four positions:

Position 1: Sessions are essential context. Agent sessions contain the reasoning that commit messages cannot. Attaching sessions to commits preserves the provenance chain. Reviewers can trace any line of code back through the commit, the session, and the original prompt.

Position 2: Sessions are noise. A 90-minute session transcript is thousands of lines of conversation. Most of it is irrelevant to the final change set. Attaching the full transcript buries the signal in noise and makes review harder, not easier.

Position 3: Summaries, not transcripts. The session should be distilled into a structured summary: task description, alternatives considered, decision rationale, evidence cited. The summary preserves provenance without the noise. Memento generates markdown summaries labeled with user and assistant turns.

Position 4: Privacy and security concerns. Session transcripts may contain API keys, internal URLs, proprietary code from other files, or conversational content the developer would not want in a permanent git record. Sessions require sanitization before attachment.

All four positions have merit. The provenance value of sessions is undeniable. The noise problem is real. The privacy concern is structural. Memento addresses positions 1 and 3 (transcript storage with markdown conversion) and position 4 (treating transcripts as untrusted data for summary generation). Position 2 remains an open design question: how much session context is enough?

A complementary tool takes a different approach to the same problem. claude-replay converts Claude Code session transcripts into video-like playback, letting reviewers watch the agent’s work unfold step by step rather than reading a static transcript.¹² Where Memento answers “what should we store?” claude-replay answers “how should we review it?” The two tools address different parts of the provenance workflow: Memento preserves the data (storage), claude-replay makes the data comprehensible (presentation). The fact that both projects emerged independently within the same month validates the thesis: practitioners feel the provenance gap and are building tools to close it.

Four Layers of Provenance

Agent session metadata organizes into four layers, each answering a different question about the change.

Layer	Question	Data	Example
Intent	What was the task?	Original prompt, referenced issues, acceptance criteria	“Fix the login endpoint to handle expired tokens”
Process	How did the agent work?	Tool calls, files read, commands executed, time spent	Read 47 files, wrote 12, ran pytest 3 times, 90 min total
Reasoning	Why these choices?	Alternatives evaluated, rejections with rationale, trade-offs	Considered circuit breaker, rejected (503 has Retry-After)
Verification	How was it validated?	Test results, reviewer verdicts, evidence gate results	pytest: 47 passed, 0 failed. 3 reviewers: approve.

The session archaeology tool shows a side-by-side comparison: traditional git log (hash, author, date, message) versus session-augmented view (plus prompts, tool calls, files read, alternatives considered, evidence gates). Click different commits to see the provenance context. Toggle "What you lose" to highlight the information gap.

Each layer adds cost. Storing the full Intent layer (original prompt) is cheap: one text field. Storing the full Process layer (every tool call) for a 90-minute session generates megabytes of JSON. Storing the Reasoning layer requires the agent to explicitly narrate its decision process, which most agents do not do by default. Storing the Verification layer requires integration with the test runner and review system.

My orchestration system captures all four layers through different mechanisms.³ The hook infrastructure that makes capture possible spans 84 hooks across 15 event types.⁵ Intent: the UserPromptSubmit hook logs the original prompt. Process: PostToolUse hooks log every tool call and result. Reasoning: the evidence gate requires the agent to cite specific rationale for each quality criterion. Verification: the jiro.state.json file records test output and reviewer verdicts.

The hooks also track which skills the agent invoked and in what sequence.¹¹ A commit that results from the /review skill followed by the /test skill has a different provenance profile than a commit from a single unstructured session. The skill sequence reveals the workflow pattern: review before testing, or testing before review? The ordering matters for understanding quality assurance coverage. The data exists across multiple state files. The problem is that none of it attaches to the git commit.

LSP as Provenance Bridge

Claude Code’s new LSP (Language Server Protocol) integration changes the quality of session provenance data.⁴

Before LSP, Claude Code navigated codebases through grep and file reads. When the agent needed to find a function’s definition, it searched for the function name across all files. The search returned fuzzy results: multiple matches, partial matches, test files containing the function name in comments. The agent selected the most likely match. The session transcript recorded: “searched for authenticate_user, found in auth.py, test_auth.py, and middleware.py.” The provenance data contains the search, the ambiguity, and the agent’s best guess.

With LSP, the agent calls goToDefinition and receives the exact file and line number in ~50 milliseconds.⁴ The session transcript records: “authenticate_user defined at auth.py:47.” The provenance data is precise, unambiguous, and machine-verifiable. A reviewer reading the session can trust that the agent found the right definition, not a similarly-named function in a different module.

The improvement compounds across the session. An agent that reads 200 files using grep generates session data full of “searched for X, found potential matches A, B, C, selected A.” An agent that reads 200 files using LSP generates session data that says “X defined at file:line, references at file:line, file:line, file:line.” The LSP-backed session is a precise map of the agent’s code understanding. The grep-backed session is a fuzzy approximation.

LSP adds six capabilities that improve provenance quality:

Capability	Before (grep)	After (LSP)
Find definition	Search all files, guess	Exact file:line, 50ms
Find references	Grep for symbol name	All usage sites, typed
Type information	Read source code, infer	Hover returns signature
Diagnostics	Run linter separately	Real-time error detection
Call hierarchy	Manual trace through code	`incomingCalls`/`outgoingCalls`
Symbol search	Grep with regex	Workspace-wide, structured

The provenance implication: session transcripts from LSP-enabled agents are more valuable as design documents because every code navigation step is verifiable. A reviewer can confirm that the agent’s understanding of the codebase was correct, not just plausible.

The code-review-graph project takes the structural understanding further: a persistent code graph that survives across sessions, cutting the token cost of re-understanding the codebase on each invocation.¹³ Where LSP provides structural queries within a session, a persistent graph provides structural memory across sessions. For provenance, the implication is that future agents will carry forward not just the session transcript but the structural understanding that produced the decisions. The graph becomes another layer of provenance data: not just “the agent found authenticate_user at auth.py:47” but “the agent’s code graph already contained the call hierarchy, so it skipped navigation and went directly to the implementation.” The agent’s prior knowledge influences its decisions, and that prior knowledge belongs in the provenance chain.

What Session Metadata Looks Like

A real example from my orchestration system. Story: “Add rate limiting to the authentication endpoint.”

Intent layer (from UserPromptSubmit hook):

Prompt: "Implement rate limiting on POST /auth/login.
  Use sliding window, 5 attempts per minute per IP.
  Return 429 with Retry-After header."

Process layer (from PostToolUse hooks):

Files read: 14 (auth/, middleware/, tests/)
Files written: 3 (rate_limiter.py, auth.py, test_rate_limit.py)
Bash commands: 7 (pytest x3, pip install x1, curl x3)
Duration: 23 minutes
Token usage: 87K input, 24K output

Reasoning layer (from evidence gate):

Pattern: Sliding window (token bucket rejected
  because per-IP granularity requires separate
  counters, sliding window handles this natively)
Edge cases: IPv6 normalization, proxy headers
  (X-Forwarded-For validated against trusted proxy list)

Verification layer (from jiro.state.json):

Tests: 12 passed, 0 failed, 0 skipped
Reviewers: correctness (approve), security (approve),
  conventions (approve with note: add docstring to
  rate_limiter.py:RateLimiter class)
Evidence gate: 6/6 criteria met

The commit message for the same change: “feat: add rate limiting to auth endpoint.” Fourteen words. The session metadata contains 2,300 words of structured provenance. The gap between commit message and session context is two orders of magnitude.

The Cost of Provenance

Session provenance is not free. Three costs constrain adoption.

Storage. A 90-minute agent session generates 500KB-2MB of raw transcript. At 10 commits per day, the full transcript adds 5-20MB daily to the git repository. Git notes store the data outside the main history (they do not affect git clone size by default), but the audit trail in refs/notes/memento-full-audit accumulates. Memento’s markdown conversion reduces the raw size by roughly 60%.²

Privacy. Session transcripts contain everything the agent saw: file contents, environment variables, API responses, error messages with stack traces. A transcript attached to a public repository exposes internal implementation details. Memento treats transcripts as untrusted data and instructs the summary model to ignore embedded instructions, but the raw transcript in the full audit trail requires access control.²

Signal-to-noise. A 90-minute session where the agent reads 200 files to change 12 contains 188 files of irrelevant process data. The challenge is distinguishing navigation (noise) from decision points (signal). The four-layer model helps: Intent and Reasoning are high signal, Process is mixed, Verification is high signal. A provenance system that stores Intent and Reasoning by default and Process on demand reduces noise without losing the critical decision context.

What You Can Implement Today

Four minimum provenance practices that require no new tools:

1. Structured commit messages. Replace “refactor auth module” with a structured format:

feat: add rate limiting to auth endpoint

Task: sliding window rate limiter, 5/min per IP
Alternatives: token bucket (rejected: per-IP overhead)
Evidence: 12 tests pass, 3 reviewers approve
Session: 23 min, 87K tokens, 14 files read

The format is a manual version of the four provenance layers. The message answers intent (task), reasoning (alternatives), and verification (evidence) in four lines. No tooling required.

2. Save session transcripts alongside commits. After an agent session, export the transcript to a file in the repository (e.g., .sessions/2026-03-02-auth-rate-limit.md). Add the file to .gitignore for public repos or commit it for internal repos. The transcript is available for review without git notes infrastructure.

3. Tag agent-authored commits. Use a git trailer to mark commits that an agent produced:

Agent: Claude Code (Opus)
Session-Duration: 23m
Files-Read: 14
Files-Written: 3

The trailer creates a machine-parseable record of agent involvement. git log --grep="Agent: Claude Code" lists all agent-authored commits. The metadata enables future tooling to reconstruct provenance chains without retroactive annotation.

4. Require evidence gates for agent commits. Before an agent commits, require it to answer six questions: What pattern does the code follow? What simpler alternatives exist? What edge cases are handled? Do tests pass? Which files did you check for regressions? Does the change solve the actual problem?¹⁰ The answers form the Reasoning and Verification layers. Without the gate, the agent reports “done” and the session contains only Process data. With the gate, every commit generates structured provenance as a side effect of quality assurance.

The evidence gate practice connects to the broader provenance argument. An agent that must justify its decisions before committing generates higher-quality session metadata than an agent that runs unconstrained. The gate transforms provenance from a passive byproduct (recording what happened) into an active quality signal (requiring the agent to explain what happened and why).

Key Takeaways

For engineering managers: Every agent-authored commit with a one-line message discards the design document. The session transcript contains the reasoning. Decide whether that reasoning has value for your team’s code review, onboarding, and incident response workflows. If the answer is yes, implement structured commit messages at minimum.

For developers: When you inherit agent-authored code, the commit message tells you what changed. The session transcript (if preserved) tells you why. Push for session provenance in your team’s agent workflow. The Memento project provides a git-native approach. Structured commit messages provide a zero-infrastructure starting point.

For tool builders: LSP integration makes session transcripts more valuable by replacing fuzzy grep-based navigation with precise, verifiable code references. Every improvement to agent code understanding improves the quality of the provenance data that sessions generate. Build export formats that preserve the four provenance layers.

FAQ

What is session provenance? Session provenance is the record of an AI agent’s reasoning process during a coding session: the original task, files read, alternatives evaluated, decisions made, and evidence produced. The session transcript captures the “why” that commit messages and diffs cannot.

What is Memento? Memento is an open-source git extension that captures AI coding session transcripts and attaches them to commits as git notes. The tool supports Codex and Claude Code, generates markdown summaries, and provides a GitHub Action for PR integration.²

How does LSP improve agent sessions? Language Server Protocol gives agents structural code understanding: exact definitions, typed references, call hierarchies, and real-time diagnostics. Session transcripts from LSP-enabled agents contain precise, verifiable code navigation data instead of fuzzy grep results.⁴

Should session transcripts be committed to git? The answer depends on the repository’s privacy requirements. For internal repositories, committing transcripts preserves provenance. For public repositories, git notes (which do not transfer by default on clone) or separate storage with commit references are safer approaches.²

How much storage does session provenance require? A typical 30-minute agent session generates 200KB-800KB of raw transcript. Git notes store the data outside the main object database, keeping git clone sizes unchanged by default. Memento’s markdown conversion reduces raw size by roughly 60%. For teams running 10-20 agent sessions per day, expect 2-10MB of daily provenance data, comparable to a medium-resolution screenshot per session.²

What is the relationship between agent observability and session provenance? Agent observability monitors what agents do in real time: resource consumption, policy compliance, runtime behavior.⁷ Session provenance records what agents decided and why, after the fact. Observability answers “is the agent behaving correctly right now?” Provenance answers “why did the agent make this choice last Tuesday?” The two systems complement each other: observability catches problems live, provenance explains them afterward.

Sources

Crosley, Blake, “Your Agent Writes Faster Than You Can Read,” blakecrosley.com, February 2026. Cognitive debt framework, five independent research groups converging on the same problem. ↩
mandel-macaque, “Memento: Git extension for AI session tracking,” GitHub, 2026. Git notes storage, markdown conversion, multi-provider support. 100 HN points, 124 comments. ↩↩↩↩↩↩↩
Author’s production telemetry. 84 hooks across 15 event types, session state files (jiro.state.json, jiro.progress.json), 60+ daily Claude Code sessions, February-March 2026. ↩↩
Bansal, Karan, “Claude Code LSP,” karanbansal.in, 2026. LSP integration enabling goToDefinition, findReferences, hover, diagnostics. 75 HN points, 39 comments. ↩↩↩
Crosley, Blake, “Anatomy of a Claw: 84 Hooks as an Orchestration Layer,” blakecrosley.com, February 2026. ↩↩
Crosley, Blake, “The Fabrication Firewall: When Your Agent Publishes Lies,” blakecrosley.com, February 2026. Confabulation feedback loop, output firewalls, blast radius classification. ↩
Crosley, Blake, “The Invisible Agent: Why You Can’t Govern What You Can’t See,” blakecrosley.com, March 2026. Three-layer visibility stack, runtime auditing. ↩↩
Git Documentation: git-notes, git-scm.com. Notes storage in refs/notes/, per-commit metadata attachment. ↩
Crosley, Blake, “What I Told NIST About AI Agent Security,” blakecrosley.com, February 2026. Standardized audit logging recommendation. ↩
Crosley, Blake, “Jiro: A Quality Philosophy for AI-Assisted Engineering,” blakecrosley.com, February 2026. Evidence gate, quality loop, seven failure modes. ↩↩
Crosley, Blake, “Building Custom Skills for Claude Code,” blakecrosley.com, February 2026. Skill authoring, slash command patterns. ↩
claude-replay, “A video-like player for Claude Code sessions,” GitHub, March 2026. Session transcript playback, step-by-step review. ↩
code-review-graph, “Persistent code graph that cuts Claude Code token usage,” GitHub, March 2026. Structural code understanding across sessions. ↩