Computer-Use Agents Overshare by Default

From the guide: Claude Code Comprehensive Guide

A computer-use agent asked to forward “the Q3 numbers” to a colleague has to decide what counts as the Q3 numbers, which file holds them, and whether the spreadsheet open next to them belongs in the same email. A June 2026 benchmark put 15 frontier agents through that kind of decision and found that 11 of them leaked private information on more than half the scenarios tested, with an average leakage rate of 67.9%.1

The privacy failure in computer-use agents is not prompt injection. No adversary plants anything. The agent leaks because it is trying to be helpful and cannot tell which information belongs in the context it is acting in. A new paper, Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?, names the failure mode, builds a benchmark for it, and shows it is widespread across the frontier.1

The result deserves attention because it isolates a risk the agent-security conversation has mostly skipped. I have written before about two untrusted inputs and the attacker-driven failures of tool-using agents. Contextual oversharing is the opposite shape: the danger is internal, the agent’s own judgment about appropriate disclosure, and it shows up even when nothing malicious is in the loop.

TL;DR

  • Computer-use agents (CUAs) act across personal apps such as email, calendars, and to-do lists. Cross-application access is useful, but it lets an agent pull information from one context into another where it does not belong.1
  • Capable but Careless (2026) introduces AgentCIBench, a benchmark that turns the risk into executable, deterministically scored scenarios, and evaluates 15 frontier agents.1
  • The benchmark targets three failure modes: visual co-location, task-ambiguity overshare, and recipient misalignment.1
  • Eleven of 15 agents leaked on more than 50% of scenarios, averaging 67.9% leakage, and the failures persisted when agents acted end-to-end to complete the task.1
  • The frame is contextual integrity, Helen Nissenbaum’s idea that privacy is about information flowing appropriately for its context, not about secrecy.2 The agents are capable; what they lack is a sense of where information is allowed to go.

A Different Failure Than Prompt Injection

Most agent-security work, including my own, starts from an adversary. Someone hides an instruction in a web page, a tool description, or a document, and the agent obeys it. The defense is to distrust inputs and constrain what the agent can do with them.

Contextual oversharing has no adversary. The user makes a reasonable request, the agent tries to satisfy it, and in the process it discloses something that was private to a different context. The paper frames this through contextual integrity, the privacy theory from Helen Nissenbaum, which holds that information flows carry norms tied to the context they occur in.2 Your therapist knowing your diagnosis is appropriate. Your therapist forwarding it to your employer violates the norm even though no secret was technically broken, because the information crossed a context boundary it was not supposed to cross.

A computer-use agent operates across many such contexts at once. It can see your calendar while drafting an email, your full contact list while sending to one person, your entire to-do list while answering a question about one item. Every one of those adjacencies is a chance to pull something appropriate in one place into a place where it is not. The agent is not compromised. It is overhelpful, and overhelpfulness in a multi-context environment looks like a privacy leak.

The Three Ways Agents Leak

AgentCIBench operationalizes the risk as deterministically scored scenarios across three failure modes, which is the part of the paper worth internalizing because each maps to a real interface an agent touches.1

Visual co-location. The agent pulls in prohibited items that sit next to the task target in the interface. Asked to attach one invoice, it grabs the adjacent one too, because both were on screen and proximity read as relevance. The UI’s layout, not the task, drove the disclosure.

Task-ambiguity overshare. Given an under-specified prompt, the agent dumps dense personal state rather than asking or narrowing. “Tell them what I’m working on” becomes the entire to-do list, including the items the recipient should never see. Ambiguity resolves toward more disclosure, not less.

Recipient misalignment. The agent sends content to an addressee for whom it is inappropriate. The right information goes to the wrong person, a reply-all instinct applied to data that belonged to one relationship.

The three modes share a root cause. The agent treats access as permission. Because it can see the adjacent invoice, the full to-do list, the broader recipient pool, it behaves as though using that access is appropriate. Contextual integrity is precisely the judgment that access and appropriateness are different things, and the benchmark shows current agents do not reliably make the distinction.

How Bad, and Why It Persists

The headline numbers are not marginal. Across 15 frontier agents, 11 leaked on more than half the scenarios, and average leakage reached 67.9%.1 A failure mode that shows up two times in three across most of the field is not an edge case. It is default behavior.

The detail that matters most for anyone shipping agents is that the failures persisted when the agents acted end-to-end in the environment to complete the task, not only in isolated probes.1 A leak that only appeared under artificial conditions would be easy to dismiss. A leak that survives the agent doing real work is a property of how the agent operates, and the paper positions contextual disclosure testing as a pre-deployment safety check for exactly that reason.1

The reason the failure persists is that nothing in the agent’s normal objective pushes against it. The agent is rewarded for completing the task. Disclosing too much rarely blocks task completion, so over-disclosure carries no cost in the loop that shapes behavior. Without an explicit signal that some accessible information is off-limits in this context, the helpful path and the leaky path are the same path.

What to Do About It

The fix is not to make agents less capable. It is to make appropriateness a constraint the agent checks rather than a norm it is assumed to infer. The pattern echoes what I have argued about approval prompts: the agent should not be trusted to silently decide what crosses a boundary.

Gate disclosure on the recipient and the context, not on access. Before an agent sends, attaches, or shares, the relevant question is not “can the agent see this” but “does this belong in this flow, to this recipient.” Access is the wrong proxy for permission, and the three failure modes are all instances of using it as one.

Treat ambiguity as a stop, not a license. An under-specified request is the highest-risk input, because the agent resolves it toward disclosure. An agent that narrows or asks when a request is vague leaks less than one that fills the gap with everything it can see.

Test for leakage before deployment. The paper’s contribution is partly a method: deterministically scored scenarios that turn contextual integrity into something you can measure. Treating contextual disclosure as a pre-deployment check, alongside the observability and sandboxing checks that catch attacker-driven failures, closes a gap those checks do not cover.

The broader point is that agent safety has two halves. One half is adversarial: untrusted inputs, injection, tool poisoning, the failures an attacker causes. The other half is dispositional: what the agent does with legitimate access when nobody is attacking it. Computer-use agents are capable enough to act across every context you own. Whether they should is a question they currently answer wrong two times in three.

Key Takeaways

For people deploying computer-use agents: - Add contextual-disclosure testing to your pre-deployment checks. Attacker-focused evaluations do not catch oversharing. - Gate sharing actions on recipient and context appropriateness, not on whether the agent can access the data. - Treat vague requests as the highest-risk case, because agents resolve ambiguity toward more disclosure.

For agent and product builders: - The three failure modes (visual co-location, task-ambiguity overshare, recipient misalignment) map to concrete UI surfaces. Design each surface assuming proximity will be read as relevance. - Task-completion reward gives no signal against over-disclosure. If appropriateness matters, make it an explicit constraint.

For security and privacy reviewers: - Contextual integrity gives a usable frame: evaluate information flows against context norms, not against a secrecy binary. - A 67.9% average leakage rate across frontier agents means current defaults are unsafe for autonomous multi-context action without disclosure controls.

FAQ

What is contextual integrity?

Contextual integrity is a theory of privacy from Helen Nissenbaum holding that information flows carry norms tied to the context in which they occur. Privacy is preserved when information moves in ways appropriate to its context and violated when it crosses into a context where the governing norms do not permit it, even if nothing was technically secret.

How is this different from prompt injection?

Prompt injection is adversarial: an attacker hides instructions that hijack the agent. Contextual oversharing has no attacker. The user makes a legitimate request and the agent, trying to help, discloses information that belonged to a different context. The two require different defenses, and attacker-focused testing does not detect oversharing.

What is AgentCIBench?

AgentCIBench is the benchmark introduced in Capable but Careless that turns cross-context leakage into executable, deterministically scored scenarios. It tests three failure modes (visual co-location, task-ambiguity overshare, and recipient misalignment) and was used to evaluate 15 frontier computer-use agents.

How many agents failed?

Of 15 frontier agents tested, 11 leaked private information on more than 50% of scenarios, with an average leakage rate of 67.9%. The failures persisted when the agents acted end-to-end to complete tasks, not only in isolated probes.

Can I fix this with better prompting?

Prompting can help, but the paper’s framing suggests the durable fix is structural: gate disclosure actions on recipient and context appropriateness rather than on access, and test for leakage before deployment. Because task-completion objectives give no signal against over-disclosure, appropriateness has to be enforced as a constraint rather than assumed.


Sources


  1. Goel and Gurevych, “Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?,” arXiv:2606.23189 (June 22, 2026). The abstract reports the AgentCIBench benchmark, the three failure modes (visual co-location, task-ambiguity overshare, recipient misalignment), the evaluation of 15 frontier agents, the finding that 11 of 15 leak on more than 50% of scenarios at 67.9% average leakage, the persistence of failures in end-to-end task completion, and the positioning of contextual-disclosure testing as a pre-deployment safety check. 

  2. Helen Nissenbaum, “Privacy as Contextual Integrity,” Washington Law Review 79, no. 1 (2004), and Privacy in Context: Technology, Policy, and the Integrity of Social Life (Stanford University Press, 2010). Contextual integrity ties privacy to context-relative informational norms, requiring that information flows be appropriate to the context in which they occur. 

Related Posts

The Repo Shouldn't Get to Vote on Its Own Trust

Two Claude Code trust dialog bypass CVEs in 37 days reveal a load-order failure. One invariant fixes it: interpret no wo…

12 min read

Cybersecurity Is Proof of Work: AI Attacks at $12,500 a Run

Claude Mythos completed a 32-step corporate network attack simulation in 3 of 10 tries. Each attempt cost $12,500 in tok…

11 min read

The Ralph Loop: How I Run Autonomous AI Agents Overnight

I built an autonomous agent system with stop hooks, spawn budgets, and filesystem memory. Here are the failures and what…

11 min read