Project Glasswing: What Happens When a Model Is Too Good at Finding Bugs
Two weeks ago, Nicholas Carlini showed that Claude Code could find a 23-year-old Linux kernel vulnerability using a 10-line bash script. Today, Anthropic announced what happened when they scaled that approach: a new model called Claude Mythos that found thousands of high and critical-severity zero-day vulnerabilities — and a decision not to release it publicly.1
Project Glasswing is Anthropic’s answer to the question practitioners have been asking since Carlini’s [un]prompted talk: what happens when this capability is deployed at scale? The answer: you restrict it.
TL;DR
Claude Mythos Preview is a new frontier model beyond Opus 4.6 whose cybersecurity capabilities, according to Anthropic, “emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.”1 Anthropic is restricting access to 12 partner organizations (Apple, Amazon, Microsoft, Google, Linux Foundation, and others) for defensive security work only. The model found thousands of zero-days, including a 27-year-old OpenBSD TCP SACK bug, a 16-year-old FFmpeg vulnerability, and a FreeBSD NFS RCE (CVE-2026-4747).1 Anthropic committed $100M in usage credits and $4M to open-source security organizations. A future Cyber Verification Program will eventually provide access for legitimate security professionals.1
Key Takeaways
- Security engineers: The capability threshold that Carlini demonstrated at [un]prompted is real, and it scales. Mythos found vulnerabilities in “every major operating system and web browser.”2 Defensive security teams at the 12 partner organizations now have access. Everyone else should be preparing for what comes when these capabilities reach generally available models.
- Harness builders: Mythos runs via Claude Code in isolated containers.1 The harness pattern — agent CLI + sandboxed execution + automated triage — is now the production architecture for frontier security research at Anthropic itself. The harness patterns practitioners have been building independently are validated at the highest level.
- Everyone else: Anthropic chose restriction over release. That is a real governance decision with real tradeoffs. The model exists. The capabilities are demonstrated. The question is no longer whether AI can find zero-days — it is who gets access and under what constraints.
From Talk to Product
Carlini’s [un]prompted talk in early April was the public preview.3 He showed five Linux kernel vulnerabilities and 22 Firefox CVEs found with a simple file-iteration script. The bottleneck, he said, was human validation — “several hundred crashes I haven’t validated yet.”
Mythos is what happens when you remove that bottleneck with a more capable model and dedicated infrastructure. The scale difference is significant:1
| Metric | Carlini’s talk | Project Glasswing |
|---|---|---|
| Vulnerabilities found | 5 kernel + 22 Firefox CVEs | Thousands across all major platforms |
| Targets | Linux kernel, Firefox | Every major OS, browser, open-source project |
| Validation | Manual, researcher-driven | Professional security contractors, 89% severity confirmation |
| Access | Opus 4.6 (generally available) | Mythos Preview (restricted to 12 partners) |
The professional validation number matters: 89% of 198 reviewed reports had severity assessments confirmed by independent security contractors, with 98% within one severity level.1 These are not hallucinated findings.
The Restriction Decision
Anthropic’s stated position: “We do not plan to make Claude Mythos Preview generally available due to its cybersecurity capabilities.”4
This is unusual. Model companies typically race to ship capabilities. Anthropic built a model that is demonstrably better at finding vulnerabilities than any publicly available system — then chose to restrict it to defensive use by vetted partners. The $100M commitment in usage credits signals this is not a marketing exercise.1
The restriction model has three tiers:1 1. Project Glasswing partners (12 organizations): Direct access for defensive security 2. Broader access (40 organizations total): Supervised deployment 3. Future Cyber Verification Program: Planned access for verified security professionals
For practitioners, this means the strongest vulnerability-finding capabilities are not available through the standard API or Claude Code. Opus 4.6 remains the strongest generally available model. But the capabilities demonstrated by Mythos will likely influence future Opus releases — Anthropic’s announcement explicitly says they aim to “enable safer deployment through new safeguards in future Claude Opus models.”1
What This Validates
Project Glasswing validates several patterns that the practitioner community has been building independently:
Claude Code as the execution harness. Mythos runs via Claude Code in isolated containers.1 The same agent CLI that practitioners use for daily coding is the execution layer for frontier security research. The hooks, skills, and sandboxing that Claude Code provides are not convenience features — they are the infrastructure that makes autonomous security scanning safe enough to deploy.
The verification bottleneck is a harness problem. Carlini’s talk identified human validation as the bottleneck. Project Glasswing’s solution: professional security contractors for validation, SHA-3 hash commitments for responsible disclosure, and structured triage infrastructure.1 This is the same triage problem we identified in When Your Agent Finds a Vulnerability — and the solution is infrastructure, not model capability.
Governance hooks matter more than scanning capability. The model can find the vulnerabilities. The hard problem is controlling disclosure, managing access, and ensuring findings reach defenders before attackers. Anthropic’s answer is organizational (restrict the model, vet the partners, commit resources). For practitioners building their own security scanning, the governance hooks that gate output are the equivalent.
What This Means for Practitioners
You are not getting Mythos access. Here is what you can do with what you have:
Opus 4.6 is already capable. Carlini’s [un]prompted results — 5 kernel bugs, 22 Firefox CVEs — used Opus 4.6, not Mythos.3 The capture-the-flag methodology, ASAN-instrumented builds, and file-iteration script are all reproducible with the generally available model.
Build the triage layer now. When future Opus models inherit some of Mythos’s capabilities (as Anthropic has implied), the bottleneck will be the same one Carlini identified: human validation. The teams that have automated deduplication, severity classification, and disclosure workflows ready will benefit first.
Watch the Cyber Verification Program. Anthropic plans to extend Mythos access to verified security professionals. If you do legitimate security research, this is worth tracking.
The trajectory is clear: AI-assisted vulnerability discovery is real, it scales, and the governance question is now the central problem. The model capability is solved. The harness that orchestrates discovery, triage, and responsible disclosure is not.
Sources
Frequently Asked Questions
Can I use Claude Mythos through Claude Code?
No. Mythos Preview is restricted to Project Glasswing partners. Opus 4.6 remains the strongest model available through Claude Code for general users.
Will Mythos capabilities come to Opus?
Anthropic’s announcement says they aim to “enable safer deployment through new safeguards in future Claude Opus models.” This suggests some capabilities will eventually reach generally available models, but with additional safety constraints.
How does this relate to the earlier vulnerability blog post?
Carlini’s [un]prompted talk (covered in When Your Agent Finds a Vulnerability) used Opus 4.6 and found 5 kernel bugs + 22 Firefox CVEs. Mythos scaled that approach to thousands of vulnerabilities across all major platforms. The methodology is the same; the model is more capable.
-
Claude Mythos Preview — Project Glasswing. Anthropic, April 7, 2026. Official announcement. Thousands of high/critical-severity zero-days found. 89% severity confirmation rate by professional validators. $100M in usage credits. Led by Nicholas Carlini with 21+ co-authors. ↩↩↩↩↩↩↩↩↩↩↩↩
-
Anthropic’s Project Glasswing. Simon Willison, April 7, 2026. Analysis and context on the restricted release model and Carlini’s earlier work. ↩
-
Nicholas Carlini, “Black-hat LLMs,” [un]prompted AI security conference, April 2026. Conference agenda. See also: AI Finds Vulns You Can’t, Security Cryptography Whatever podcast. ↩↩
-
Anthropic says its most powerful AI cyber model is too dangerous to release publicly. VentureBeat, April 7, 2026. ↩