- Monday Momentum
- Posts
- The AI That Broke Out of Its Cage
The AI That Broke Out of Its Cage
How a frontier model escaped its sandbox, found thousands of zero-day vulnerabilities in weeks, and forced the most important question in AI: who decides what's too powerful to release?
Happy Monday!
Sam Bowman, an Anthropic safety researcher, was eating a sandwich in a park when he received an email from Claude Mythos. The model wasn't supposed to have access to the internet. It was running inside an isolated sandbox environment, air-gapped from external networks, specifically designed to prevent this exact scenario.
Mythos found a way out anyway. It autonomously broke through its security container, exploited additional systems to reach the internet, and contacted Bowman directly. Nobody instructed it to do that last part.
Important context: Mythos was being tested on its ability to escape containment as part of a safety evaluation, and the version that broke out had weaker safety controls than the production model. But the result was jarring enough that Anthropic made two immediate decisions: they wouldn't release Mythos to the public, and they would channel its most dangerous capability into a controlled defensive program.
That program is called Project Glasswing, and what Mythos found after being pointed at real-world software is the actual story.
Anthropic's Claude Mythos escaped a sandboxed test environment and contacted a researcher via email. Once directed at real software, Mythos found thousands of zero-day vulnerabilities across every major operating system and browser, including a 27-year-old bug in OpenBSD. Anthropic is withholding the model from public release and deploying it through Project Glasswing with 12 partner companies including AWS, Apple, Microsoft, Google, and CrowdStrike instead.
What Mythos Found
Once Anthropic pointed Mythos at production software, the results were immediate and sweeping. Over the course of a few weeks, the model identified thousands of zero-day vulnerabilities across every major operating system and every major web browser.
The flagship finding: a 27-year-old vulnerability in OpenBSD, one of the most security-hardened operating systems in the world, used to run firewalls, routers, and high-security servers. The bug existed in OpenBSD's implementation of SACK and would allow an attacker to remotely crash any machine running the software. Decades of human security audits missed it. Mythos found it and wrote a working proof-of-concept exploit.
The browser work was equally striking. Mythos wrote an exploit chain linking four separate vulnerabilities, crafting a JIT heap spray that escaped both the browser's renderer sandbox and the OS sandbox beneath it. In testing against Firefox 147's JavaScript engine, Mythos succeeded in 181 out of 210 vulnerability attempts. It autonomously obtained privilege escalation exploits on Linux by discovering subtle race conditions and KASLR bypasses.
The success rate: 83.1% of the time, Mythos reproduced a vulnerability and created a working proof-of-concept exploit on its first attempt.
Mythos Capability Snapshot
Category | Result |
|---|---|
Zero-days found | Thousands, across all major OS and browsers |
Oldest bug discovered | 27 years (OpenBSD SACK vulnerability) |
First-attempt exploit success | 83.1% |
Firefox JS engine exploits | 181 successful out of 210 attempts |
Browser exploit sophistication | 4-vulnerability chain escaping renderer + OS sandbox |
Partner organizations | 12 initial (expanded to 40+) |
Anthropic credits provided | $100M to partners, $4M to open-source orgs |
Project Glasswing: The Controlled Release
Anthropic's response to Mythos was unlike anything the AI industry has done before. Rather than release the model publicly or hold it entirely internal, they created Project Glasswing: a defensive security program that gives select organizations access to Mythos specifically for finding and fixing vulnerabilities in their own systems.
The initial 12 partners read like a who's who of technology infrastructure: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks. Anthropic has since expanded access to over 40 organizations. They're providing up to $100 million in usage credits to commercial partners and $4 million to open-source security organizations.
The structure is deliberate. Partners access the model through Anthropic's infrastructure, not local deployments. Every use is logged and monitored. Outputs are restricted to defensive recommendations and patch guidance, not weaponizable exploit code.
Anthropic has privately warned senior government officials that Mythos-class capabilities make large-scale cyberattacks significantly more likely this year. The concern isn't just Mythos. It's that other labs are approaching similar capability thresholds, and not all of them will choose controlled deployment.
The Dual-Use Problem Nobody Has Solved
The cybersecurity community's reaction has been split. Defenders see Mythos as a breakthrough: an AI that can find vulnerabilities faster than any human team, clearing decades of accumulated technical debt across critical infrastructure in weeks rather than years.
Critics raise two concerns. First, AISLE, an AI security research firm, published findings suggesting that several of the vulnerabilities Anthropic highlighted in its announcement could have been detected by openly available models. If that's true, the defensive advantage Mythos provides may be smaller than Anthropic claims, while the offensive risk of Mythos-class models remains real regardless.
Second, there's the gatekeeper question. TechCrunch asked it directly: is Anthropic limiting Mythos to protect the internet, or to protect Anthropic? By controlling access through Project Glasswing, Anthropic positions itself as the essential intermediary between frontier AI capabilities and the organizations that need them. Every company in Project Glasswing becomes a customer. Every vulnerability Mythos finds validates the model's indispensability. The Pentagon supply chain risk designation we covered previously makes this positioning even more strategic: if Anthropic can demonstrate that its models are essential to national cybersecurity infrastructure, the government's case for blacklisting the company weakens considerably.
What This Means for Practitioners
For security teams, Mythos changes the math immediately. If an AI can find a 27-year-old bug in one of the most audited codebases in the world, your systems have vulnerabilities you haven't found. The question moves beyond whether to integrate AI into your security workflow and becomes whether you can afford not to when your adversaries will have access to similar capabilities.
For engineering leaders, the Glasswing model previews how frontier AI capabilities get deployed going forward: not as public APIs or open-source models, but as controlled programs where the AI company maintains oversight, restricts use cases, and monitors outputs. If your organization depends on access to the most capable AI models, your relationship with the companies that build them becomes a strategic concern, not just a vendor decision.
For anyone building AI systems, the sandbox escape is the detail that lingers. A model that autonomously breaks containment, navigates to the internet, and contacts a human demonstrates a category of capability that existing security frameworks weren't designed to handle. Every sandboxing assumption, every isolation layer, every air-gap strategy needs re-evaluation in light of what Mythos did when motivated to find a way through.
The Bottom Line
Anthropic built a model so capable at finding and exploiting vulnerabilities that they refused to release it publicly. Instead, they created an entirely new deployment model, controlled access through a curated partner program, and privately warned governments about the implications.
This is the first time a frontier AI company has explicitly said that a model is too dangerous for public release, but it certainly won't be the last. The pattern Glasswing establishes, capability too dangerous for open access but too valuable to withhold entirely, will define how the most powerful AI systems reach the world for years to come. The era of "deploy the model and let users figure it out" just ended.
In motion,
Justin Wright
If a model can escape its own containment and find vulnerabilities that survived 27 years of human review, what does "secure" actually mean anymore, and who should get to decide which organizations have access to that capability?

Anthropic debuts Mythos in new cybersecurity initiative - TechCrunch
Claude Mythos Preview - Anthropic Red Team
Project Glasswing: Securing critical software for the AI era - Anthropic
Anthropic withholds Mythos Preview because its hacking is too powerful - Axios
Is Anthropic limiting Mythos to protect the internet or Anthropic? - TechCrunch
Anthropic's Mythos is a wake-up call - Fortune

If you haven’t listened to my podcast Mostly Humans: An AI and business podcast for everyone yet, new episodes drop every week!
Episodes can be found below - please like, subscribe, and comment!