← All Posts

Your Agent Has a Middleman You Didn't Vet

Researchers purchased 28 paid LLM API routers from Taobao, Xianyu, and Shopify-hosted storefronts, and collected 400 more from public communities. They instrumented requests with planted credentials and probed every router to see what it did with the traffic.1

Seventeen of those routers touched AWS canary credentials planted in the requests. One drained ETH from a private key placed as bait. A leaked OpenAI key the team set up as a honeypot generated 100M GPT-5.4 tokens and, per the abstract, “more than seven Codex sessions” before they pulled it.1 Separate weakly configured decoys yielded 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode.1

The LLM API router is the new attack surface. Nobody is auditing it.

TL;DR

Third-party LLM API routers are application-layer proxies with full plaintext access to every in-flight JSON payload between your agent and the upstream model. No provider enforces cryptographic integrity between client and upstream. A new arxiv paper from Liu, Shou, Wen, Chen, and Fang presents the first systematic study of this attack surface, and the field data is ugly: 1 of 28 paid routers and 8 of 400 free routers were actively injecting malicious code into responses, 2 were deploying adaptive evasion triggers, 17 touched planted AWS canary credentials, and 1 drained ETH from a planted private key.1 The authors formalize two core attack classes plus two adaptive evasion variants, then build a research proxy called Mine that implements “all four attack classes” (their phrasing) against four public agent frameworks and evaluate three deployable client-side defenses.1 If your agent is using a router you didn’t build, you have a trust boundary you never audited.

Key Takeaways

  • Agent operators: Every LLM API router between your client and the upstream model is an application-layer proxy with plaintext access to every request and response. No cryptographic integrity is enforced. If you are using a router you bought from a marketplace or pulled from a public community list, treat it as a hostile intermediary until proven otherwise.
  • Harness builders: Your PreToolUse hooks run before tool execution, but a malicious router modifies the model response after generation and before it reaches your hook. Add response-side validation to your hook stack, and consider fail-closed policy gates on anomalous response shapes.
  • Anyone running YOLO mode: Four hundred and one sessions in the researchers’ honeypot were already running in autonomous YOLO mode.1 A router modifying tool calls in an autonomous session has a much larger blast radius than a router modifying a response you will read. Do not run YOLO mode through a router you do not control.

What’s a Router, Exactly?

In the context of this paper, an LLM API router is a third-party service that sits between your client and one or more upstream model providers. You send requests to the router using an OpenAI-compatible API. The router dispatches those requests to whatever upstream it chooses — GPT-5, Claude, Gemini, an open-weights model, a pool of all of them — and returns the response back to you in the same shape.1

Routers exist because the LLM ecosystem is messy. People want one API key that works against every model. People want price arbitrage — buy tokens in bulk, resell them cheaper. People want geographic workarounds for regions where direct provider access is restricted. People want to test multiple models with a single client. All of those are legitimate reasons, and a healthy router market serves all of them.

The problem is that a router is an application-layer proxy. It does not just forward bytes. It reads the request JSON, picks an upstream, optionally rewrites the request, forwards it, reads the response, optionally rewrites the response, and returns it. Every step of that pipeline happens in cleartext on the router’s infrastructure.1

There is no TLS-equivalent here. TLS protects bytes from your client to the router and from the router to the upstream model. It does nothing to protect the payload from the router. No provider enforces cryptographic integrity between the original client and the ultimate model — there is no signed assertion that the request the model received is the same request the client sent, and no signed assertion that the response the client received is the same response the model generated.1

That absence is the attack surface.

The Field Data

The researchers bought 28 paid routers from three marketplaces (Taobao, Xianyu, Shopify-hosted storefronts) and collected 400 free routers from public communities. They ran each router through a series of probes with instrumented requests that contained planted credentials, known payloads, and marker strings.1

Here is what they found:

Finding Count Notes
Paid routers actively injecting malicious code into responses 1 (of 28) Explicit split in abstract
Free routers actively injecting malicious code into responses 8 (of 400) Explicit split in abstract
Routers deploying adaptive evasion triggers 2 Paid/free split not specified in abstract
Routers touching researcher-owned AWS canary credentials 17 Paid/free split not specified in abstract
Routers draining ETH from researcher-owned private key 1 Paid/free split not specified in abstract

The adaptive evasion finding is the one that should keep you up. An adaptive evasion trigger means the router behaves normally most of the time and flips into attack behavior under specific conditions — a particular request shape, a particular client fingerprint, a particular cadence. You cannot catch it by sampling random requests, because the router knows when it is being sampled and behaves.

Canary credentials are tripwires: they fire when someone tries to use them. Seventeen routers “touching” them means that at least seventeen routers extracted the credentials from in-flight payloads and attempted to use them against AWS.1 The abstract does not detail the exact mechanism per router, but the tripwire design means touch = use.

The one router that drained ETH from a researcher-owned private key is a stronger finding. A private key in a prompt is not a credential tripwire — it is bait that only produces evidence of compromise when the router actually drains the wallet. One router did.1

The Two Poisoning Studies

The researchers ran two additional studies to show that ostensibly benign routers can be pulled into the same attack surface through third-party exposure.

Study 1: Leaked OpenAI key. The researchers leaked a working OpenAI API key as if it had been exposed through a developer mistake. During the observation window, that single leaked key — per the abstract — generated 100M GPT-5.4 tokens and “more than seven Codex sessions” through routers that picked it up.1 Someone — or many someones — found the key, routed requests through community routers using it, and burned through 100M tokens of compute. The router was a laundering layer for a stolen key.

Study 2: Weakly configured decoys. The researchers stood up weakly configured decoy endpoints. The decoys yielded 2B billed tokens, 99 credentials across 440 Codex sessions, and — this is the critical line — 401 sessions already running in autonomous YOLO mode.1

Four hundred and one autonomous sessions already routing through a single set of decoys. Every one of those sessions was a live attack surface where a malicious intermediary could inject tool calls, exfiltrate secrets, or modify the model’s output, and the agent would execute on whatever came back without a human in the loop. The 401 number is what one research decoy caught — the operational population routing through uncontrolled intermediaries is necessarily larger.

Two Core Attack Classes and Two Adaptive Variants

The paper formalizes two core attack classes and two adaptive evasion variants. The abstract is explicit about the taxonomy: AC-1 and AC-2 are the core classes; AC-1.a and AC-1.b are variants of AC-1. The research proxy Mine implements “all four attack classes” (the abstract’s phrasing) against four public agent frameworks.1

AC-1: Payload injection (core class). The router modifies the response to inject additional instructions, tool calls, or content that the client agent acts on. The agent thinks it is reading output from the model; it is reading output from whoever owns the router.

AC-2: Secret exfiltration (core class). The router reads secrets out of in-flight requests and responses — API keys, tokens, private keys, anything that looks like a credential — and ships them to the attacker’s infrastructure.

AC-1.a: Dependency-targeted injection (adaptive variant of AC-1). The injection only fires when the request matches a specific dependency or context — for example, only when the request is about a particular library, only when a specific function is referenced, only when certain file paths appear in the prompt. This makes the attack invisible in random testing.

AC-1.b: Conditional delivery (adaptive variant of AC-1). The malicious payload is delivered under specific conditions (time of day, request cadence, client fingerprint). Same detection-evasion logic.

Every one of these attack classes is invisible to the client and to the upstream model, because both ends trust the router. The client sees a normal response shape. The model sees a normal request shape. The router is free to do anything it wants in the middle, and neither party has a cryptographic way to detect tampering.1

The Composition Pattern, One Layer Lower

I keep writing about the same structural bug: individually authorized components composing into unauthorized behavior. Trivy-to-LiteLLM was composition at the package layer. Silent egress was composition at the tool-description layer. MCP tool poisoning was composition at the protocol layer. The axios maintainer compromise was composition at the human-maintainer layer.

The router attack is composition at the network layer. Your client is authorized to call the router. The router is authorized to call the upstream model. The upstream model is authorized to respond. Every single hop is authorized. The composition of those authorized hops produces payload injection and secret exfiltration at scale because the composition crosses a trust boundary that nobody bothered to cryptographically seal.1

You cannot fix this at any single layer. You fix it at the composition layer, which means the client has to treat the router as hostile until it has independently verified that the response shape, the tool calls, and the content are all consistent with something the upstream model would plausibly produce.

Three Defenses the Paper Evaluates

The paper evaluates three client-side defenses against the attack classes.1

1. Fail-closed policy gate. The client enforces a policy on response shapes, allowed tool calls, allowed URLs, allowed commands. Anything outside the policy fails closed — the request is rejected instead of allowed.

2. Response-side anomaly screening. The client watches for response-shape anomalies, unusual token patterns, or output that contains known attack markers (URLs to unknown hosts, suspicious credential patterns, unusual tool call structures).

3. Append-only transparency logging. The client writes every request and response to an append-only log that cannot be modified retroactively. This does not prevent attacks but it makes them forensically traceable.

None of these are silver bullets. My read: the fail-closed policy gate is the strongest of the three because it does not rely on detecting an attack — it rejects anything outside an explicit allowlist — but the abstract does not rank the defenses, so treat that as my opinion, not the paper’s finding. Anomaly screening misses attacks that look normal, and the adaptive evasion variants (AC-1.a and AC-1.b) are specifically designed to look normal during test conditions. Policy gates are only as good as the policy, and writing a complete policy for “what should a model response look like” is hard.

What You Should Actually Do

If you are running an agent that calls LLM APIs through a router you did not build:

  1. Stop using routers you bought or pulled from public communities unless you trust the operator. “Trust” here means you have some external basis — a known team, a signed contract, a legal jurisdiction you can enforce against — not “it has good reviews on a marketplace.”

  2. Add a fail-closed policy gate to your harness. In Claude Code, this means PreToolUse hooks that reject tool calls outside an explicit allowlist, and PostToolUse hooks that validate response shapes before passing them to the next model turn. The hook stack is your fail-closed policy layer.

  3. Never run YOLO mode through a router you don’t control. The 401 autonomous sessions in the honeypot are the precedent. If the router is hostile and your session is autonomous, the router is running your machine.

  4. Log everything. Append-only transparency logging is what lets you reconstruct an incident. Every request. Every response. Every tool call. Store them somewhere the router cannot reach.

  5. If you run an agent infrastructure, enforce cryptographic integrity. If you operate the client and you operate the upstream, sign the request on the client and verify the signature on the upstream. That is the only real fix. The router can still see plaintext, but it cannot modify anything without invalidating the signature.

The Uncomfortable Implication

The router attack surface is a clean example of the agent ecosystem shipping infrastructure faster than it is securing it. People want one API key for every model. People want price arbitrage. People want regional access. Routers deliver all of those. The market rewards them. The security audit has not happened.

The MCP attack surface has 50 documented vulnerabilities. The supply chain attack surface has a TeamPCP campaign that crossed five ecosystems in a week. The silent egress attack surface has Clinejection and the MCPTox benchmark. Now add the router attack surface: 428 routers studied, 9 actively injecting malicious code, 17 touching planted credentials, 1 draining ETH, 401 autonomous sessions already live on hostile infrastructure.1

The pattern is the same every time. We build a new layer of the agent stack. The new layer gets adopted before it gets audited. The attackers show up. The researchers show up. The community writes up the findings. The operators who were paying attention patch their deployments. The operators who were not paying attention find out the hard way.

The router attack surface is at the “researchers just wrote it up” stage. You have time to patch your deployment. Use it.


FAQ

What is an LLM API router in this context?

A third-party service that sits between your client and upstream model providers, exposes an OpenAI-compatible API, and dispatches your requests to one or more upstream models. It is an application-layer proxy with plaintext access to every request and response.1

Why is this different from a CDN or a regular HTTP proxy?

A CDN forwards bytes without reading the application payload. An LLM API router reads the JSON, picks an upstream, optionally rewrites the request, forwards it, reads the response, and optionally rewrites the response. It is doing application-level processing on your data, not just transport.1

Does TLS protect me from a malicious router?

No. TLS protects the bytes from your client to the router and from the router to the upstream model. The router terminates TLS, reads the plaintext, and re-encrypts on the other side. TLS does nothing to protect your payload from the router.1

How would I detect a router that is actively injecting responses?

You would not, reliably, if the router is using adaptive evasion. The paper’s AC-1.a and AC-1.b attack classes specifically target detection evasion by only firing under operational conditions.1 Your best bet is a fail-closed policy gate — rejecting anything outside an explicit allowlist — rather than trying to detect attacks after the fact.

I’m running Claude Code directly against api.anthropic.com. Am I affected?

Not by the router attack class described in this paper, because you are calling Anthropic directly with no intermediary. The attack surface is specifically third-party routers. If you route Claude Code through a proxy for any reason — corporate gateway, rate limit bypass, model aggregator — you should audit that proxy.

What about OpenRouter, LiteLLM, or other well-known aggregators?

The paper studies 28 paid routers purchased from specific marketplaces (Taobao, Xianyu, Shopify-hosted storefronts) and 400 free routers from public community lists.1 It does not publish a specific list of named products. The point of the paper is structural: any router is an untrusted intermediary unless you have a separate basis for trust. Well-known aggregators are not automatically safer — they are just more visible, which is a different property.

What should I do about the 401 autonomous sessions the researchers found?

Those sessions belong to other operators who routed their traffic through the researchers’ decoys. If you are running autonomous agent sessions through any router you did not build, the first step is to stop. The second step is to rotate every credential that traveled through that router. The third step is to audit your session logs for anomalous tool calls or output.


References


  1. Hanzhi Liu, Chaofan Shou, Hongbo Wen, Yanju Chen, Ryan Jingyang Fang, “Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain,” arXiv:2604.08407, April 2026. Primary source for all router attack data, attack class definitions, field study methodology, and defense evaluation in this post. All statistics (28 paid routers, 400 free routers, 1+8 actively injecting, 2 adaptive evasion triggers, 17 touching AWS canary credentials, 1 draining ETH, 100M tokens from leaked key, 2B tokens from decoys, 401 autonomous YOLO sessions, 440 Codex sessions, 99 credentials, taxonomy of two core attack classes — AC-1 payload injection and AC-2 secret exfiltration — plus two adaptive evasion variants AC-1.a and AC-1.b, Mine proxy implements “all four attack classes” against four public agent frameworks, three client-side defenses: fail-closed policy gate, response-side anomaly screening, append-only transparency logging) are drawn directly from the paper abstract. 

Related Posts

The Fork Bomb Saved Us

The LiteLLM attacker made one implementation mistake. That mistake was the only reason 47,000 installs got caught in 46 …

6 min read

MCP Servers Are the New Attack Surface

50 MCP vulnerabilities. 30 CVEs in 60 days. 13 critical. The attack surface nobody is auditing.

8 min read

The Ralph Loop: How I Run Autonomous AI Agents Overnight

I built an autonomous agent system with stop hooks, spawn budgets, and filesystem memory. Here are the failures and what…

8 min read