Mythos and Glasswing
Anthropic, the company behind the Claude AI chatbot, made two security announcements that were shocking for many but seen as inevitable by those of us working in AI security. First, it announced Mythos Preview, a new, non-public AI model that turns out to be startlingly good at finding security flaws in software. The second was Project Glasswing, Anthropic’s program for getting that capability into the hands of the companies best positioned to fix those flaws before anyone else can exploit them. Apple is one of those companies.
As much as I’d like to downplay the announcements, Mythos and Project Glasswing are very big deals on their own, and harbingers for the future of digital security. Mythos was able to find and exploit new vulnerabilities in every major operating system, including a bug in OpenBSD, an operating system famous for its security, that had been sitting there unnoticed for 27 years.
[…]
We are at the start of a period in which finding software flaws that affect everyday users will become dramatically easier for both attackers and defenders. […] However, over the long run, I believe using AI to identify security vulnerabilities favors defenders, because developers can find and fix many more bugs before shipping software to the public.
Anthropic has a habit of making wild and scary public statements that seem designed to generate headlines and funding but sort of fall apart upon scrutiny. I initially dismissed this as more of the same, but people seem to be taking it seriously.
Our model is so good, it’s not safe to release, yet. Has to be one of the greatest AI marketing stunts ever.
There’s reason for cynicism, given Anthropic’s history, but the part of the “Boy Cries Wolf” myth everyone forgets is that the wolf did come in the end.
If Anthropic has really developed an LLM that can suss out security weaknesses better than any other AI, the US government would be foolish to continue shunning them.
Or, rather, if the government believes the marketing, it may want to take control of the company and its technology, like how it restricted restricted civilian nuclear research.
In fact, Amodei already answered the question: if nuclear weapons were developed by a private company, and that private company sought to dictate terms to the U.S. military, the U.S. would absolutely be incentivized to destroy that company.
Previously:
- iOS 18.7.7 and iPadOS 18.7.7
- LLMs and Software Development Roundup
- curl Removes Bug Bounties
- Common Vulnerabilities and Exposures (CVE) Funding
- curl Takes Action Against AI Bug Reports
Update (2026-04-13): Martin Alderson (Hacker News):
For nearly 20 years the deal has been simple: you click a link, arbitrary code runs on your device, and a stack of sandboxes keeps that code from doing anything nasty. Browser sandboxes for untrusted JavaScript, VM sandboxes for multi-tenant cloud, ad iframes so banner creatives can't take over your phone or laptop - the modern internet is built on the assumption that those sandboxes hold. Anthropic just shipped a research preview that generates working exploits for one of them 72.4% of the time, up from under 1% a few months ago. That deal might be breaking.
[…]
If an LLM can find exploits in sandboxes - which are some of the most well secured pieces of software on the planet - then suddenly every website you aimlessly browse through could contain malicious code which can 'escape' the sandbox and theoretically take control of your device - and all the data on your phone could be sent to someone nasty.
[…]
Equally, sandboxes (and virtualisation) are fundamental to allowing cloud computing to operate at scale.
That’s the pitch in Anthropic’s blog and verbose 250-page report on the model — which includes over 20 pages of Anthropic staff waxing lyrically about their novel impressions of the new model and its “fondness for particular philosophers.”
Alongside the repeated suggestions from Anthropic and its staff that we should be concerned, nay, terrified, of what AI like Claude Mythos can do, they repeatedly suggest they’re unsure if this new AI is conscious.
For the record, it is not. It might be good at finding vulnerabilities in software, but many of them aren’t as potentially damaging as Anthropic wants us all to believe.
[…]
Under the subheading, “and several thousand more,” Anthropic also states that it can’t actually confirm that all of the thousands of bugs Mythos claims to have found are actually critical security vulnerabilities. It’s just extrapolated that number from having found in around 90% of the “198 manually reviewed vulnerability reports, [Anthropic’s] expert contractors agreed with Claude’s severity assessment exactly.”
When I read about Mythos one thing stood out to me: It didn’t matter if the modal was aligned or safe. You couldn’t afford to run it anyway, and they can’t afford to serve it to you. And that’s a better explanation for why they’ve limited access to Mythos.
[…]
If Mythos is only affordable by the very largest companies – I think cybersecurity is a very shrewd focus by Anthropic. But for reasons that concern me.
[…]
I think this is Anthropic’s next big play. Scare everyone with some security theater. And sell big tech some tiger rocks. And everyone will be too terrified to ever stop paying for Mythos. Big tech might even be willing to pay billions for multiple models.
In other words, Anthropic isn’t facing a marginal cost problem, but an opportunity cost problem: where to allocate its compute.
[…]
The key to handling those costs will be to charge more for Claude going forward; that, by extension, means maintaining pricing power, which leads to a second benefit of not releasing Mythos broadly. Anthropic certainly faces competition from OpenAI; for both frontier labs, however, the real competition in the long run are open source models.
One thing I have not seen discussed about #Mythos. Will @apple really give Claude and therefore potentially the whole world access to their private source code?
This is very much a PR play by Anthropic—and it worked.
[…]
These models do demonstrate an increased sophistication in their cyberattack capabilities. They write effective exploits—taking the vulnerabilities they find and operationalizing them—without human involvement.
[…]
The security company Aisle was able to replicate the vulnerabilities that Anthropic found, using older, cheaper, public models. But there is a difference between finding a vulnerability and turning it into an attack. This points to a current advantage to the defender.
[…]
A couple of weeks ago, I wrote about security in what I called “the age of instant software,” where AIs are superhumanly good at finding, exploiting, and patching vulnerabilities. I stand by everything I wrote there. The urgency is now greater than ever.
Previously:
Update (2026-04-17): Bruce Schneier:
This is, in many respects, exactly the kind of responsible disclosure that security researchers have long urged. And yet the public has been given remarkably little with which to evaluate Anthropic’s decision. We have been shown a highlight reel of spectacular successes. However, we can’t tell if we have a blockbuster until they let us see the whole movie.
For example, we don’t know how many times Mythos mistakenly flagged code as vulnerable. Anthropic said security contractors agreed with the AI’s severity rating 198 times, with an 89 per cent severity agreement. That’s impressive, but incomplete. Independent researchers examining similar models have found that AI that detects nearly every real bug also hallucinates plausible-sounding vulnerabilities in patched, correct code.
This matters. A model that autonomously finds and exploits hundreds of vulnerabilities with inhuman precision is a game changer, but a model that generates thousands of false alarms and non-working attacks still needs skilled and knowledgeable humans.
Update (2026-04-27): John Gruber:
So on the one hand, Anthropic itself is the one describing Mythos as a dangerous national security threat. On the other hand, their own security is so sloppy that rando hooligans on Discord have had access to Mythos since the day it was announced, and regularly access other unreleased Claude models. This, just weeks after Anthropic screwed up and accidentally exposed the entire source code to Claude Code.
Update (2026-04-29): Bruce Schneier:
We see Mythos as a real but incremental step, one in a long line of incremental steps. But even incremental steps can be important when we look at the big picture.
[…]
So we must separate the patchable from the unpatchable, and the easy to verify from the hard to verify. This taxonomy also provides us guidance for how to protect such systems in an era of powerful AI vulnerability-finding tools.
[…]
This also raises the salience of best practices in software engineering. Automated, thorough, and continuous testing was always important. Now we can take this practice a step further and use defensive AI agents to test exploits against a real stack, over and over, until the false positives have been weeded out and the real vulnerabilities and fixes are confirmed. This kind of VulnOps is likely to become a standard part of the development process.
Update (2026-05-18): Julie Bort (Hacker News):
After Sam Altman trash-talked Anthropic for gatekeeping its cybersecurity tool Mythos by only releasing it to select users, he confirmed that OpenAI would be doing the same with its competing tool, Cyber.
Update (2026-05-25): Anthropic (Hacker News):
Since then, we and our approximately 50 partners have used Claude Mythos Preview to find more than ten thousand high- or critical-severity vulnerabilities across the most systemically important software in the world. Progress on software security used to be limited by how quickly we could find new vulnerabilities. Now it’s limited by how quickly we can verify, disclose, and patch the large numbers of vulnerabilities found by AI.
16 Comments RSS · Twitter · Mastodon
"If Anthropic has really developed an LLM that can suss out security weaknesses better than any other AI"
What does that mean, though? I've had GLM5 find tons of vulnerabilities in existing source code.
So what does "better than other AI" mean? It finds them faster? It finds vulnerabilities other LLMs can't find? Is this really a fundamentally more dangerous threat than previous models? If it is such an issue, why wasn't it one with Opus 4.5 or 4.6?
I call BS.
It's quite real and the article/paper goes into sufficient depth to prove it. Human-reviewed patches have been accepted by multiple well known projects/organizations for incredibly difficult to discover *chains* of multiple obscure vulnerabilities that when combined make for high severity exploits. It's effectively an automated nation-state level offensive cyberattack generator for 5 figures USD investment.
Regardless of what you think of AI and/or Anthropic, they're absolutely right to proceed cautiously and assist securing widely deployed common software in advance.
What’s so hard to believe about this? Mythos is trained on 10x the parameters of any other frontier model.
> Regardless of what you think of AI and/or Anthropic, they're absolutely right to proceed cautiously and assist securing widely deployed common software in advance.
It appears to be the consensus, but is it really? At the risk of reopening the discussion about "Responsible" vs "Full" disclosure, I suggest that relying on mere computing scale to fortify software against security bugs is a terrible idea, and even putting aside the problems of exclusivity today, whatever is being done now in public by a frontier lab could just as easily be done by another frontier lab in contract to the great and good and with far fewer scruples.
But hey, maybe that's just me.
It's also being developed by governments around the world. Probably, hopefully, not as successfully.
We're headed for a really strange decade
Also funny how we don't need super intelligence to ruin the world, all we need is stubborn single mindedness.
“ For now, the problem is contained. Only Anthropic has Mythos.”
Yes, sure. It’s well known that source code never leaks…
“There’s reason for cynicism, given Anthropic’s history, but the part of the “Boy Cries Wolf” myth everyone forgets is that the wolf did come in the end.”
That’s an absolutely ridiculous thing to say, since “the boy who cried wolf” is an allegory and the wolf in this case is not an independent entity but a company trying to promote itself
Does he really want to promote the idea that everyone who cries wolf will be correct at some point in the future? Because that’s absurd.
Also, the entire point of the story is that the wolf comes at the end.
Who doesn't know that? Hady ANYONE forgotten about that?
Worst way to try and make a point. He should have gone for the Cobain quote about paranoia instead.
> Does he really want to promote the idea that everyone who cries wolf will be correct at some point in the future? Because that’s absurd.
Obviously not, but people choosing to ignore a real potential risk because of a bad messenger, is likely to be bit at some point.
Anthropic already can't properly provide Opus 4.6 with their current hardware infrastructure, so making a more expensive model widely available might not be possible for them at the moment. Offering the model only to a limited set of clients under the threat of security vulnerabilities allows them to charge much more and prevents distillation by Chinese competitors.
This is a marketing and sales campaign. This model will be widely available. When it is, nothing bad will happen that isn't already happening.
@gildarts
> Obviously not, but people choosing to ignore a real potential risk because of a bad messenger, is likely to be bit at some point.
Is the alternative to take every single proposed risk seriously, even if it’s by an entity who has repeatedly used these kinds of risks to promote themselves?
Think about it: if we take every Anthropic announcement of risk seriously, they’re going to use it as free press to take up headlines and announce their upcoming products.
Oh wait, they already do that because we keep falling for it because we tend to take their announcements of risk seriously, even though they’ve repeatedly shown to be false advertising at best
“ if nuclear weapons were developed by a private company, and that private company sought to dictate terms to the U.S. military, the U.S. would absolutely be incentivized to destroy that company.”
It wouldn’t just be incentivized to destroy the company, it would be necessary to destroy the company and jail the founders, executives, and employees for putting the public at risk
Never thought about it this way before, but it’s crazy we have a situation where a few companies are constantly creating products they say are extremely dangerous to society at large and almost no one in the government seeks to protect citizens from the companies or their products
> Is the alternative to take every single proposed risk seriously, even if it’s by an entity who has repeatedly used these kinds of risks to promote themselves?
@Manx, I’m not saying that Anthropic should be taken at face value every time, but I’m saying it is a good idea to at least evaluate the risks they are talking about.
My main dispute is with the people who go from Anthropic always crying wolf to there is no wolf because there never has been before.
"For the record, it is not."
Oh wow, somebody just solved philosophy and forgot to tell anyone else.