Agent Security

The attack surfaces, trust boundaries, and runtime defenses for autonomous AI systems. MCP vulnerabilities, supply chain risks, sandbox escapes, and governance frameworks.

20 articles

Cybersecurity Is Proof of Work

Claude Mythos completed a 32-step corporate network attack simulation in 3 of 10 tries. Each attempt cost $12,500 in tokens. Security is now a...

2026-04-14

Runtime Defense for Tool-Augmented Agents

ClawGuard demonstrates deterministic tool-call interception works. The Vercel telemetry incident shows why. Runtime defense is the enforceable layer.

2026-04-14

Your Agent Has a Middleman You Didn't Vet

Researchers bought 28 LLM API routers and collected 400 more. 17 touched AWS canary credentials. One drained ETH from a private key. The router...

2026-04-10

MCP Servers Are the New Attack Surface

50 MCP vulnerabilities. 30 CVEs in 60 days. 13 critical. The attack surface nobody is auditing.

2026-04-08

Project Glasswing: What Happens When a Model Is Too Good at Finding Bugs

Anthropic built a model that finds thousands of zero-days, then restricted it to 12 partners. What Project Glasswing means for agent-assisted security.

2026-04-07

When Your Agent Finds a Vulnerability

An Anthropic researcher found a 23-year-old Linux kernel vulnerability using Claude Code and a 10-line bash script. 22 Firefox CVEs followed. What...

2026-04-05

What the Claude Code Source Leak Reveals

A practitioner's analysis of the Claude Code source leak. 11 findings that explain how auto mode, bash security, prompt caching, and multi-agent...

2026-04-02

Every Hook Is a Scar

84 hooks, 15 event types. Each one traces back to a specific failure. Institutional memory in shell scripts.

2026-03-29

The Fork Bomb Saved Us

The LiteLLM attacker made one implementation mistake. That mistake was the only reason 47,000 installs got caught in 46 minutes.

2026-03-28

AI Agent Research: Claude Beat 33 Attack Methods

Claude Code autonomously discovered adversarial attacks with 100% success rate against Meta's SecAlign-70B, beating all 33 published methods in 96...

2026-03-26

The Supply Chain Is the Attack Surface

Trivy got compromised. Then LiteLLM. Then 47,000 installs in 46 minutes. The AI supply chain worked exactly as designed.

2026-03-25

AI Agent Security: The Deploy-and-Defend Trust Paradox

1 in 8 enterprise AI breaches involve autonomous agents. Runtime hooks, OS-level sandboxes, and drift detection break the deploy-and-defend cycle.

2026-03-20

Every Iteration Makes Your Code Less Secure

43.7% of LLM iteration chains introduce more vulnerabilities than baseline. Adding SAST scanners makes it worse. SCAFFOLD-CEGIS cuts degradation to 2.1%.

2026-03-12

Your Agent Sandbox Is a Suggestion

An attacker opened a GitHub issue and shipped malware in Cline's next release. Agent sandboxes fail at three levels. Here is what actually works.

2026-03-05

AI Agent Observability: Monitoring What You Can't See

AI agents consume disk, CPU, and network with zero operator visibility. Three observability layers close the gap before damage is irreversible.

2026-03-02

Silent Egress: The Attack Surface You Didn't Build

A malicious web page injected instructions into URL metadata. The agent fetched it, read the poison, and exfiltrated the API key. No error. No log.

2026-03-02

What I Told NIST About AI Agent Security

Production evidence submitted to NIST: AI agent threats are behavioral. 7 failure modes, 3-layer defense, and framework gaps from 60 daily sessions.

2026-02-24

The Fabrication Firewall: When Your Agent Publishes Lies

An autonomous agent published fabricated claims to 8 platforms over 72 hours. Training-phase safety failed at the publication boundary. Here is the fix.

2026-02-23

Runtime Constitutions for AI Agents: A Governance Framework

Runtime constitutions enforce AI agent governance where training-phase alignment fails. Competence checks, output gates, and four subsystems keep...

2026-02-22

AI Agent Memory Degradation: Why Multi-Turn LLMs Collapse

LLMs lose 39% accuracy across 200K+ multi-turn sessions. Three mechanisms drive collapse and longer context windows fix none of them.

2026-02-22