The Invisible Agent: Why You Can't Govern What You Can't See
Anthropic shipped a feature called Cowork in Claude Desktop. The feature created a 10GB virtual machine bundle on every macOS installation. Users who never enabled Cowork still got the VM. Users who deleted it watched it regenerate. One user reported the bundle growing to 21GB. The GitHub issue collected 345 points and 175 comments on Hacker News before Anthropic acknowledged the problem.1
Nobody noticed until disk space ran out.
TL;DR
Agent tools now allocate compute resources (disk, memory, CPU, network) without operator visibility. Anthropic’s Cowork VM is the visible example; every MCP tool call, every spawned sub-agent, and every web fetch is an invisible one. Governing agents requires three layers of observability: resource metering (what did it consume?), policy enforcement (what was it allowed to do?), and runtime auditing (what did it actually do?). Two open-source projects address the policy and audit layers (mcp-firewall and Logira), but no production tool covers all three. Below: the visibility problem, the three-layer stack, what each layer catches, and minimum monitoring hooks you can implement today.
The Visibility Problem
Traditional software operates below an observability line that operators choose to draw. A web server writes access logs because engineers configured logging. A database tracks slow queries because someone set log_min_duration_statement. The operator decides the granularity.
Agent systems invert the relationship. The agent decides what to execute at runtime. A coding agent that receives “fix the login endpoint” might read 47 files, write to 12, spawn three sub-agents, fetch two web pages, and execute 15 bash commands. Each action consumes resources. None of the consumption shows up in traditional monitoring.
The Cowork incident exposed the inversion at the infrastructure level. Claude Desktop allocated 10GB of disk space, consumed 24-55% CPU at idle, and drove swap usage from 20K to 24K+ swapins on 8GB machines.1 Users discovered the resource consumption through macOS storage warnings, not through Anthropic’s telemetry. The application provided no dashboard, no meter, and no opt-in disclosure for the VM allocation.
Scale the pattern to agent sessions. My hook orchestration system intercepts 15 event types across every tool call.11 Over 60 sessions, the system logged 84 hooks firing on each action, producing telemetry that no default agent installation provides.2 Without that instrumentation, I would not have detected the 12 drift incidents, the phantom verification failures, or the recursive spawning loops documented in my NIST public comment.3
The DORA 2024 Accelerate State of DevOps Report found that teams with strong observability practices deploy more frequently and recover faster from failures. The 2025 edition extends the framework to AI-assisted development, connecting observability to “how AI-assisted coding or testing affects quality, lead time, and overall reliability.”4 Agent observability is not a nice-to-have. Measuring agent behavior is a prerequisite for governing it.
Three Layers of Agent Visibility
Agent observability requires three independent layers. Each layer answers a different question. A failure in one layer does not compromise the others.
| Layer | Question | Monitors | Example Tool |
|---|---|---|---|
| Resource metering | What did it consume? | Disk, memory, CPU, network per session | Cowork should have shown this |
| Policy enforcement | What was it allowed to do? | Allow/deny rules, tool permissions, scope limits | mcp-firewall |
| Runtime auditing | What did it actually do? | Syscall log, file access, network egress | Logira |
The layers map to a progression: you cannot enforce policy on resources you do not measure, and you cannot audit compliance with policies you never defined. Each layer builds on the one below it.
Layer 1: Resource Metering
Resource metering answers: how much did the agent consume, and where?
The Cowork incident is a resource metering failure. The VM bundle consumed 10GB of disk space. The renderer process consumed 24% CPU at idle. Swap activity climbed steadily during sessions. All of these metrics existed in macOS Activity Monitor. None appeared in Claude Desktop’s interface.1
For agent coding sessions, resource metering tracks four dimensions:
Disk. Every file write, every cache entry, every log file. My sessions generate 200-400KB of state files per session (jiro.state.json, jiro.progress.json, hook logs). Over 60 sessions, that accumulates to 12-24MB of state data that persists across sessions unless explicitly cleaned.2
Memory. Context window consumption per turn. A 200,000-token context window costs roughly $3 per full fill at current Opus pricing. My cost tracker logs cumulative token usage per session, with budget thresholds at 80%, 90%, and 95% of a configurable limit.5
CPU. Hook execution time. My nine-hook prompt dispatcher adds 200ms per prompt. That overhead is invisible to users (human typing is the bottleneck) but compounds across automated pipelines. The ralph autonomous loop fires the dispatcher 50-100 times per story, adding 10-20 seconds of hook overhead per story.2
Network. Web fetches, API calls, MCP tool invocations. Every outbound request is a potential data channel. My web extract library logs fetch URLs and response sizes. Without network metering, a web fetch that returns a 50MB response is indistinguishable from one returning 5KB.6
No commercial agent tool provides a per-session resource dashboard. Cloud providers meter compute for billing, not for operator visibility. The gap between what agents consume and what operators can see is the resource metering deficit.
The absence feels invisible until the numbers accumulate. One session that writes 400KB of state files is nothing. Sixty sessions that write 400KB each, with no cleanup, leaves 24MB of orphaned state. One web fetch that returns 847KB is negligible. A scanning pipeline that fetches 80 URLs per run generates 67MB of cached content that the agent’s tool abstraction hides from the operator. Resource metering makes the cumulative visible before it becomes the crisis that drives someone to file GitHub issue #22543.1
Layer 2: Policy Enforcement
Policy enforcement answers: what rules constrain the agent, and are those rules applied consistently?
mcp-firewall addresses the policy layer for CLI agents.7 The tool sits between the agent and all tool use requests, evaluating each request against a regex-based policy before execution. Policies use JSONNet configuration files scoped by folder, git repository, or user. The firewall supports Claude Code and GitHub Copilot CLI through PreToolUse hook integration.
The architecture reflects a key insight: every agent implements its own halfway solution to allow/deny logic. Claude Code uses glob patterns. Codex CLI uses prefix-only matching. Each approach covers a subset of the policy space. mcp-firewall centralizes the rules into one engine that works across agents.
Consider the policy gap without centralized enforcement. My hook system includes 12 PreToolUse:Bash handlers that check for credential patterns, dangerous git operations, sensitive path access, and deployment commands.2 Each handler is a separate shell script with its own regex patterns. When I need to add a new deny rule, I write a new script. When I need to audit which rules exist, I grep across 12 files. mcp-firewall consolidates that into a single config file with explicit allow arrays.
The OWASP Top 10 for Agentic Applications (2025) identifies Agent Goal Hijacking (ASI01) and Excessive Agency (LLM06:2025) as top risks.8 Both risks require policy enforcement at the tool-call level. An agent that hijacks a goal still makes tool calls. An agent with excessive agency still requests permissions. Policy enforcement intercepts both at the boundary where the agent’s intent meets the system’s tools.
Policy enforcement differs from access control. Traditional access control asks “does this user have permission?” Policy enforcement for agents asks “does this action, in this context, for this task, fall within the approved scope?” The context sensitivity is the challenge. A git push to a feature branch and a git push --force to main are the same tool (Bash) with different blast radii. mcp-firewall’s regex patterns can distinguish between them. Default agent permissions cannot.
Layer 3: Runtime Auditing
Runtime auditing answers: what did the agent actually do at the syscall level?
Logira addresses the audit layer using eBPF probes to intercept system calls at the kernel level.9 The tool records three event categories: process execution (exec events), file operations (including credential file access), and network connections (with destination tracking). Each audited run generates three files: events.jsonl for timeline review, index.sqlite for queryable filtering, and meta.json for run metadata.
The design philosophy is “observe-only”: Logira records and detects but does not enforce or block.9 The separation from the enforcement layer is deliberate. Policy enforcement prevents known-bad actions. Runtime auditing discovers unknown-bad actions after the fact. The two layers serve different temporal functions: prevention (before) and forensics (after).
Logira’s eBPF probes operate below the application layer. An agent that constructs a novel command to exfiltrate data still makes syscalls. The agent cannot hide file reads, network connections, or process spawns from kernel-level tracing. The approach catches what application-level hooks miss: side effects that bypass the tool-call abstraction.
Built-in detection rules target AI agent risks specifically: credential file access, persistence mechanism changes (/etc, systemd, cron), suspicious command chains (curl-pipe-sh patterns), destructive operations (rm -rf), and anomalous network egress.9 The rules are opinionated defaults for the agent threat model, not generic system auditing.
The platform constraint matters. Logira requires Linux 5.8+ with cgroup v2. macOS agents (Claude Desktop, Claude Code on Darwin) cannot use eBPF-based auditing. My OS sandbox uses macOS Seatbelt profiles as the closest equivalent: kernel-enforced deny rules that block writes to sensitive paths.3 Seatbelt is enforcement, not auditing. macOS lacks a production-ready equivalent to Logira’s observe-only audit trail.
The distinction between enforcement and auditing maps to a temporal split in incident response. Enforcement prevents the incident. Auditing enables reconstruction after the incident. Both are necessary. An enforcement layer that blocks all credential access prevents exfiltration but also prevents legitimate SSH operations. An audit layer that logs all credential access without blocking enables the operator to review access patterns and tune enforcement rules based on evidence. The feedback loop between audit data and policy refinement is how the visibility stack improves over time: audit reveals patterns, patterns inform policy, policy reduces the surface the audit needs to cover.
Logira’s cgroup v2 isolation adds a feature that application-level auditing cannot replicate: run-scoped attribution. The system attributes every event to a specific audited run, not to the system globally. When two agent sessions run concurrently on the same machine, cgroup isolation ensures that file access in session A does not appear in session B’s audit trail. Application-level hooks cannot provide the same guarantee because the hooks fire within the agent process, which has no kernel-level boundary separating concurrent sessions.9
What I Actually Run
My orchestration system covers all three layers through hooks, not through dedicated monitoring tools.
Resource metering. The cost-gate hook tracks token usage per session against configurable budget thresholds.5 The system performance monitor checks CPU, memory, disk, and swap at configurable intervals, injecting warnings when resource pressure exceeds thresholds.10 The session drift detector fires every 25 tool calls, computing cosine similarity between the original prompt embedding and a sliding window of recent actions.2
Policy enforcement. Eight PreToolUse dispatcher hooks route to handler hooks by tool type. PreToolUse:Bash alone runs 12 handlers covering credential patterns, destructive git operations, sensitive path access, and deployment commands. The recursion guard enforces a maximum depth of two and maximum five children per parent agent.2
Runtime auditing. PostToolUse hooks log every tool call result. The security scanning hooks check bash output for credential leaks after execution. Session state files (jiro.state.json) record every story completion, reviewer verdict, and evidence gate result.2 The system does not use eBPF (macOS limitation) but captures tool-level telemetry through the hook pipeline.
| Layer | My Implementation | Limitation |
|---|---|---|
| Resource metering | cost-gate, sysmon, drift detector | No per-tool disk/network breakdown |
| Policy enforcement | 84 hooks across 15 event types | Per-hook regex, not centralized config |
| Runtime auditing | PostToolUse loggers, session state files | Application-level only, no syscall trace |
The system works because every action passes through the hook pipeline. The limitation is depth: hook-level monitoring captures what the agent asked to do, not what the operating system actually executed. An agent that constructs a bash command with embedded subshells executes code the hook sees as a single string. Kernel-level auditing would see each subprocess.
The Compounding Blind Spot
Agents spawning agents multiply opacity. Each delegation hop introduces information loss.
When my orchestration system runs the ralph autonomous loop, the parent process spawns fresh Claude Code instances for each PRD story. Each child agent gets a focused task and a fresh context window. The parent tracks completion status. The parent does not see the child’s individual tool calls, file reads, or resource consumption.2
At depth one (parent spawns child), the parent sees the child’s final output. At depth two (child spawns grandchild), the parent sees the child’s report about the grandchild’s output. Each hop compresses information. The delegation chain analysis in my NIST comment measured three compounding risks: semantic compression (context collapses to a prompt string), authority amplification (children inherit permissions without understanding sensitivity), and accountability diffusion (the root agent bears responsibility for results it never inspected).3
Observability degrades at the same rate. A three-layer visibility stack on the root agent provides zero visibility into the grandchild agent unless each child independently runs its own monitoring. My recursion guard enforces the depth limit, but the guard is a policy control, not an observability control. Knowing that delegation stopped at depth two does not tell you what happened at depth two.
A concrete example from my production system: the ralph loop spawned a child agent to implement a database migration story. The child agent decided the migration needed a “verification step” and spawned its own sub-agent to run integration tests. The grandchild agent failed silently (the test database was not configured). The child agent received an empty response, interpreted silence as success, and reported the story complete. The parent logged “story 4: complete.” I discovered the broken migration three hours later when the application crashed on the missing column. The root agent’s telemetry showed a clean run. The failure lived two hops deep, invisible to every monitoring layer I had deployed on the root.2
The OWASP Agentic Applications framework addresses cascading failures and rogue agents but does not prescribe observability requirements for multi-agent delegation chains.8 The gap is structural: each agent in the chain would need its own resource metering, policy enforcement, and runtime auditing, independently configured and independently reported. The overhead is multiplicative. Three layers of monitoring on three agents in a chain is nine monitoring instances, each generating its own telemetry, each requiring its own configuration. No existing tool manages that coordination.
What You Can Implement Today
Three minimum monitoring hooks that cover the visibility stack:
1. Resource: Token budget tracker. Log cumulative input and output tokens per session. Set a hard limit. Alert at 80%. The implementation requires reading the agent’s usage stats (Claude Code exposes session costs via /cost) and comparing against a threshold. My cost-gate hook does this in 47 lines of bash.5
2. Policy: PreToolUse deny list. Create a hook that fires before every Bash tool call. Check the command against a list of patterns: rm -rf /, git push --force, paths containing .ssh or .env, curl | sh. Block matches. The implementation requires one shell script that reads stdin (the tool call JSON), extracts the command field, and greps against a pattern file. My credential-checking hook does this in 31 lines.2
3. Audit: PostToolUse session log. Append every tool call and result to a session-specific JSONL file. Include timestamp, tool name, arguments, and exit code. The log enables post-session reconstruction: what did the agent do, in what order, and did anything fail silently? My session logger does this in 22 lines of bash.2
A worked example of the deny list hook in settings.json:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "~/.claude/hooks/check-sensitive-paths.sh"
}
]
}
]
}
}
The hook script reads the tool call from stdin, extracts the command string, and checks against patterns. A blocked command returns a JSON object with {"decision": "block", "reason": "Sensitive path access denied"}. An allowed command returns {"decision": "approve"}. Claude Code respects both responses without further prompting. The entire hook adds zero latency to approved commands (the regex check runs in under 5ms) and provides immediate feedback for blocked ones.
These three hooks take less than 100 lines total. They do not replace dedicated monitoring tools. They replace zero visibility with minimum visibility. Minimum visibility is the prerequisite for every governance decision that follows. You cannot set a resource budget without metering. You cannot enforce a scope policy without a deny list. You cannot investigate an incident without an audit log. Start with the log. The other two follow.
Key Takeaways
For platform engineers: Agents consume resources that existing monitoring does not track. Disk, memory, CPU, and network usage per agent session belong on the same dashboard as container metrics. The Cowork incident proves the need: 10GB allocated with zero operator visibility.
For security teams: Policy enforcement at the tool-call boundary is the minimum viable agent security posture. mcp-firewall’s centralized approach consolidates per-agent allow/deny logic into one auditable configuration. Evaluate whether your agent’s built-in permissions cover the policy space your threat model requires.
For engineering managers: Ask three questions about your agent tooling: Can you see per-session resource consumption? Can you define and audit tool-call policies? Can you reconstruct what an agent did after the fact? If any answer is “no,” you have a visibility gap that grows with every additional agent in your workflow.
FAQ
What is agent observability? Agent observability is the ability to monitor and understand what an AI agent does during execution: what resources it consumes, what actions it takes, and whether those actions comply with defined policies.
Why did Anthropic’s Cowork create a 10GB VM? The Cowork feature in Claude Desktop provisions a virtual machine for collaborative development sessions. Claude Desktop creates the VM bundle automatically on every macOS installation, even for users who never enable the feature, and keeps it until manually deleted.1
What is mcp-firewall? mcp-firewall is an open-source policy enforcement tool that intercepts tool use requests from CLI agents (Claude Code, GitHub Copilot CLI) and evaluates them against regex-based allow/deny rules before execution.7
What is eBPF runtime auditing? eBPF (extended Berkeley Packet Filter) enables kernel-level tracing of system calls without modifying the audited process. Tools like Logira use eBPF probes to record process execution, file operations, and network connections during AI agent runs.9
Sources
-
mystcb et al., “Cowork feature creates 10GB VM bundle that severely degrades performance,” GitHub Issue #22543, anthropics/claude-code, February 2026. 345 HN points, 175 comments. ↩↩↩↩↩
-
Author’s production telemetry. 84 hooks across 15 event types, ~15,000 lines of orchestration code, 60+ daily Claude Code sessions, February-March 2026. ↩↩↩↩↩↩↩↩↩↩↩
-
Crosley, Blake, “What I Told NIST About AI Agent Security,” blakecrosley.com, February 2026. Public comment on NIST-2025-0035. ↩↩↩
-
DORA Accelerate State of DevOps Report 2024, Google Cloud, 2024. 39,000+ professionals surveyed. ↩
-
Author’s cost-gate hook implementation. SQLite-backed budget tracker with configurable thresholds (80%/90%/95%), 36 tests, February 2026. ↩↩↩
-
Author’s web content extraction library. trafilatura 2.0.0, URL logging and response size tracking, 25 tests, February 2026. ↩
-
dzervas, “mcp-firewall,” GitHub, 2026. Go binary with JSONNet policy configuration, PreToolUse hook integration. ↩↩
-
OWASP Top 10 for Agentic Applications, OWASP GenAI Security Project, 2025. 100+ security researchers contributed. ↩↩
-
melonattacker, “Logira: eBPF runtime auditing for AI agent runs,” GitHub, 2026. Linux 5.8+, cgroup v2, observe-only design. ↩↩↩↩↩
-
Author’s system performance monitoring module. CPU, memory, disk, and swap monitoring with configurable thresholds, 46 tests, February 2026. ↩
-
Crosley, Blake, “Anatomy of a Claw: 84 Hooks as an Orchestration Layer,” blakecrosley.com, February 2026. ↩