The model found the flaw. The headline broke in.

June 28, 2026·3 min read

Point an agent at a codebase and it finds things. The ones I run on Phora turn up more problems than I'd have found by hand: security holes, dead branches, off-by-ones nobody had noticed in years. Finding them is the easy part. Doing something about them is a different job, and usually a slower, more human one.

To anyone who works with these tools, that distinction is so ordinary it's boring. Then it became national news, told with the wrong verb.

An Anthropic model called Mythos was pointed at the NSA's own classified systems in a controlled red-team test. Within hours it found vulnerabilities in almost all of them. By the time the story reached the rest of us, "found vulnerabilities in" had become "broke into" — and an AI had, supposedly, hacked the NSA in an afternoon.

Three different verbs

Finding a weakness and exploiting it are not the same act. Finding it inside a sealed network a real attacker could never reach is a third thing again. The officials who ran the test said as much: the model surfaced flaws fast, but it didn't exploit them, in an environment built to be impossible to replicate.

None of that nuance travels. Senator Mark Warner quoted the NSA chief: "broke into almost all of our classified systems, not in weeks, but in hours." The Economist printed it. A week later social media had flattened it to "Anthropic hacked the NSA," and the journalist who first reported it had to issue a correction. The correction went nowhere.

The capability is real. That's not the scary part.

Here's where the comforting version fails too. A model that finds almost every hole in a hardened classified network in hours is not nothing. Britain's AI Security Institute rated Mythos substantially more capable at offensive cyber than anything it had tested. An earlier build reportedly turned up a 27-year-old flaw in OpenBSD, one of the most defended codebases on earth. The capability is real. It's just not the cartoon it got sold as.

And the cartoon has a cost. A model this good at finding flaws is exactly what you want finding yours first, before someone less friendly does. More than a hundred security leaders told the government that pulling it back probably helps adversaries more than it protects anyone, because the same model defends about as well as it attacks. The verb you choose decides whether you're looking at a weapon or a smoke detector.

Read the verb

Almost every week now a model "does" something that, up close, turns out narrower and more interesting than the headline. Found, not exploited. Shown in a lab, not loose in the world. The distance between those is where the actual judgment lives — and it's exactly the distance a headline is built to erase.

I'm not nervous about a model that finds a flaw in an afternoon. I'm nervous about a rule written in one, on the strength of the wrong verb.

Nothing broke into the NSA. A sentence did.

There's always a next level.

If you like what you see (whether you're building a product or a team) I'd love to hear about it.

Contact→Get in touch