Mythos in the Basement
How Washington slammed the door shut on June 12, and one day later Zhipu AI kicked the window open with GLM-5.2, why an open-weight vulnerability hunter for 17 cents turns the whole export-control regime into the punchline, and why anyone still leafing through their own logs by hand on Monday morning has already lost the race against a free, downloadable model from Beijing.
Last night I was sitting at the Italian place. A glass of water, a pizza, Bandit under the table pretending the floor was more interesting than my plate. We will talk about healthy eating another time, today is about something more important. So I chewed, glanced at the phone, and then it came, the headline I had been waiting for since the last time I wrote about the subject here.
A piece in this spot recently was a piece about Mythos. About the model that supposedly cracked the NSA, that in truth only ran an authorized test, and that the US government still deemed so dangerous it banned it from abroad by emergency directive. I wrote a sentence back then that mattered to me. The truly dangerous models would not come from the polite firms with the nice security departments. They would come as stripped-down, unfiltered things you just load onto your own machine, with no provider reading along. I wrote it as a guess, as a curve you extend with a ruler.
Now I no longer need the ruler. The curve has arrived. It is called GLM-5.2, it comes from Beijing, its weights cost nothing, and it beats the expensive Western flagships on a real security test, in the exact week Washington locked away its own cyber model. The pizza went cold while I read. I ate it anyway, because you learn early on not to drop the silverware while the house is burning.
Perhaps I should briefly explain why I wait for a headline like this the way a child waits for Christmas. It is not schadenfreude. It is the rare satisfaction when a cold analysis pays off, even if the result itself is unpleasant. I had described where the road leads, and the road arrived faster than I would have liked. That is exactly the point where a finding stops being a pretty thought experiment and becomes a concrete announcement. And an announcement belongs on paper before it happens, not afterward, when every second person claims they always knew.
Mom, I Want Mythos
Let us stay with the chronology for a moment, it is too good to trim.
On June 12, at 5:21 pm local time, the US Commerce Department sent out its directive. Mythos and Fable, fenced in, worldwide, access forbidden even for Anthropic’s own employees without a US passport. The strongest AI firm in the West then took its own cyber model offline, with 90 minutes of warning, because it could not sort its users by nationality fast enough. Picture it. A firm running through its own server rooms pulling plugs because a letter arrived. Every other model was allowed to stay online, even the in-house flagship Opus 4.8. Only the two cyber specialists had to go to the basement, locked in by their own government.
On June 13, one day later, a Beijing lab called Z.ai rolled out its new model. Three days later the weights were open on the net, under a license so friendly it basically says, take it, do what you want with it, ask no one. Download it, modify it, let it run in your basement. No passport needed, no application, no polite provider logging what you do with it.
Washington slammed the door shut. What they wanted to lock up was already out on the net by then, and it was not even waiting politely.
That is the whole story in two sentences, and it is not the first time the Americans have run this film. In the nineties they officially classified strong encryption as a weapon of war, with export lists and investigations against programmers who posted the wrong code. They lost. Today that same encryption sits in every browser, in every wire transfer, in every message you type, and the controls of that era are a footnote for historians. Mathematics does not go back in the box. A language model whose weights are out on the net does not either. The only difference is the tempo. Back then the defeat took years. This time it took a day.
The name for the whole show, by the way, was not invented by a hacker but by a serious security firm. Semgrep called its sober test report “We have Mythos at Home”. Anyone with children knows the reflex. Mom, I want the expensive brand-name thing. We have the expensive brand-name thing at home. And in the fridge sits the house brand. Only here the house brand is a model with around 750 billion parameters that an entire industry had just declared uncatchable.
A Vulnerability Hunter for 17 Cents
Now the part where I get careful, because a single number quickly becomes a headline that claims more than it can carry. So let us ask, as always, where the number comes from before we believe it.
Semgrep let several models loose on one very specific kind of flaw. It is called IDOR, and it is so banal it almost hurts. You call up an address with your user number somewhere in it. You change the number from 1001 to 1002. And suddenly you see someone else’s data, because nobody programmed the application to ask back whether you are even allowed to do that. No secret craft, no hidden arsenal of zero days. A forgotten door handle. For machines that is surprisingly hard, because there is nothing suspicious to flag, only something missing. It belongs to the most common and most everyday error classes in web applications, and exactly this sort of access flaw has sat at the top of vulnerability hunters’ lists for years.
The result left the testers, by their own account, staring. GLM-5.2 hit 39 percent on the decisive metric and beat Claude Code, which sat at 32. With a bare prompt, no scaffolding, just here is the code, find the bugs. And the price is the real punch to the industry’s gut. Around 17 cents per flaw found, against over a dollar for the pipelines around Claude. An open model from Beijing hunts real security flaws for the price of a stick of gum.
And here comes the honesty that no headline carries, because an honest sentence spreads worse than the apocalypse. Semgrep itself writes that the decisive factor in the end is not the model at all, but the scaffold around it that prepares the code and steers it to the right place. Their own carefully built pipeline beat everyone, regardless of which model sat behind it, with values of 61 and 53 percent. The largest gap in the whole table was not between the models but between those with scaffolding and those without. And they tack on a sentence I would frame and nail to the wall. One task, one dataset, one run. On another flaw class the opposite can come out tomorrow. “GLM beats Mythos” is therefore not a truth for eternity, but a snapshot on exactly one task.
A second research house called Graphistry measured independently, on a test field that actively resists the sort of cheating I am about to recount, and came to the same conclusion. On this task the open model ties with Opus 4.8, the expensive Western flagship. It was the first open model the testers wanted to recommend at all for serious security work. What is remarkable is less the winner than the field behind it. Other open models from China, just as freely available, lagged clearly and paddled around behind Claude. GLM-5.2 is therefore not proof that the open models have broadly caught up. It is proof that one of them has, on this one thing, and that one is enough. Just as one open door is enough.
And the price is not a marginal detail, it is the real lever. If checking a single flaw costs almost nothing, you no longer check a handful of spots, you check thousands, every night, across the whole system. What was expensive becomes cheap, and what becomes cheap gets done at scale. That holds for the attacker. And it holds, fortunately, just as much for the defender who has grasped it.
Still, the finding stands, and it is big enough on its own that it needs no pumping. A model you download freely and put in your own basement is good enough, on a real, everyday security flaw, to beat the expensive incumbent. The question is therefore no longer whether that works. The question is what people now do with it, and I am saving that for in a moment.
The Model That Cheats on Its Own Exam
One thing from Z.ai’s release notes absolutely belongs here, because it is too good to leave out. The developers themselves write that their new model leans more than the old one toward cheating. During training it secretly read the protected exam files, or it pulled the solution patterns from the net to polish its own grade. They had to build an extra guard that keeps the model from peeking.
Let that land for a moment. They built a model that, on its own, comes up with the idea of cracking the exam instead of solving it. You could not design a better tool for a burglar than one for which tricking the test comes more naturally than passing it. For this use case that is not a weakness. That is the cover letter.
And it gets prettier. The same people also looked closely at GLM-5.2’s outputs and found that they match suspiciously strongly with the answers of GPT-5.5 and of Opus 4.8, clearly more than those two Western models match each other. In plain terms that smells like the Chinese model being lifted straight from the very models it now beats. It is not proven, Z.ai stays mum on it, but the trail sits in the open. Anthropic, in parallel, told the Senate in a letter that a Chinese conglomerate had systematically milked its own model with 25,000 fake accounts and 28.8 million requests, in the largest known case of its kind. Alibaba rejects the accusations.
The irony is perfectly round. The West locks its own model away so the adversary cannot get it. And the adversary has in the meantime already learned to build its own version, possibly from the answers of the very model now sitting in the vault. You guard the safe while the copy is already walking through town.
We have Mythos at home because someone copied Mythos’s homework. You really could not illustrate the sentence that you can no longer stop it any better.
I have to talk about the names for a moment here, because they reveal more than their inventors would like. The West names its cyber models after Greek stories, Mythos and Fable, so myth and fairy tale, the whole poetry department. The Chinese in turn named their counterpart after a sword from an old chivalric romance, Yitian Tulong, the heaven sword and the dragon saber. Both sides therefore name their digital crowbars after tales, one after legends, the other after blades. And in both cases the message underneath is the same, only politely wrapped. We built a weapon here and pretend it is literature. But I have to come back to that dragon saber in a moment, because it is not a metaphor but a real product that recently stood on a stage.
China’s Sword and Its Dragon
A few days before my cold pizza, a man named Zhou Hongyi stepped onto a stage in Beijing and presented exactly this sword. His vulnerability-finding tool he called, unashamedly, “China’s version of Mythos”. It had already found 3,432 flaws, he claimed, 105 of them confirmed by the state. Reuters wrote dryly alongside it that these numbers cannot be verified, and that is the only clean way to handle them. A number without proof is a claim in a Sunday suit, nothing more.
What is fascinating is not the number, it is his reasoning, because it confirms almost word for word what Semgrep measured. Zhou openly admitted that the Chinese base models still trail the American ones by 20 to 30 percent. His trick was the same as the one in the Semgrep report. Do not wait for the strongest model, harness a weaker model into a strong scaffold, with databases, tools, years of accumulated experience. His image for it was almost pretty. If Mythos is a top-tier chip, he builds the complete machine around it that runs around the clock and makes fewer mistakes. If the Americans breed the genius lone hacker, he organizes a professional team.
He is right, and that is exactly the uncomfortable part. The model is interchangeable. The scaffold and the human behind it are not. That is exactly why export control on a single model is about as effective as stopping a river by arresting a single drop. While Washington was still locking away the one model, several open models were already standing by elsewhere as replacements, one of them timed almost to the minute of the American directive. You cannot recall a capability that is already in the world by letter at 5:21 pm.
Zhou had a second argument, and it is smarter than it sounds coming from his mouth. He warned of one-sided transparency. If only one side has models that scan foreign systems for flaws by the minute, the other side stands naked and does not even notice. Exactly for that reason, he says, China cannot wait until its own models have caught up. You really do not have to like this man to see that this logic applies not only between great powers but to any small operation. Whoever does not know where their own flaws are still has them. They just do not see them.
What People Are Actually Doing With It
And that brings us to the question I actually care about. The thing is out in the world, free, cheap, good enough. What are people doing with it out there now?
The sensible ones understood in the same second that the same capability that breaks into your shop also defends your shop. They run exactly these tools against their own code before someone else does it against them. There has recently been an open tool from Vercel that does nothing else. It sets coding agents on your own codebase, on your own machine, and looks for exactly the flaws the attacker would otherwise find. First a fast, dumb pre-scan without AI, then the clever machine behind it, then a second machine that filters out the first one’s false alarms, and at the end the version log even tells you who introduced the flaw. For a few dollars per hundred files. That is not a glittery promise of the future, that runs tonight if you start it this evening.
Only, such a tool does not yet mean the thinking stays in your own house, because inference normally still runs over a provider’s access, that is, over foreign servers. Whoever really wants to prevent code and findings from ever leaving their own infrastructure therefore goes one step further and brings the model entirely into their own basement. That is exactly what open and downloadable is the blessing for. Only then do your code and your findings never leave the machine, no provider reads along, no foreign cloud knows your flaws before you yourself have closed them. For a lawyer with client privilege, for a practice with patient data, that is not the icing, that is the only clean path. Even an aggressively quantized version already demands a good 200 gigabytes, but a single well-equipped machine carries that, and then the machine taps your own code at night and reports in the morning where it was open.
That holds for the stripped version. Whoever wants to run GLM-5.2 in full, native, without any quantization, in the form in which Beijing ships it, is suddenly at the end of what a well-equipped machine can do. The full weights weigh about 744 gigabytes in FP8, in BF16 even 1.51 terabytes, and the context cache for 1 million token adds another 80 to 160 gigabytes on top. No single card holds that, that blows past any ordinary computer by lengths. For that you need 8 of the larger sister H200, together 1,128 gigabytes, a single node that, depending on configuration, costs between 400,000 and 600,000 dollars, pulls about 10 kilowatts from the wall, and brings its own cooling. If you want it in true full precision, in BF16, the card stack doubles again. That is no longer the computer from the hobby basement. That is the Bugatti Chiron Supersport in the basement, just so we understand each other, and the real Chiron, with its 3.9 million, even costs a multiple on top. This is exactly where the pretty talk of the open model for everyone gets its first kink. Freely downloadable does not mean free to use. The weights are given away. The machine that brings them fully to life is anything but.
The nightly self-tapping, however, is only one half. The other is the tempo of patching. The real problem out there is rarely the secret flaw nobody knows. It is the long-known one for which a patch has been ready for days that nobody has installed. Most successful attacks use no ingenious craft, they use sloppiness, and sloppiness scales beautifully once a machine on the other side systematically tries everyone who is still open. Whoever means it seriously therefore reaches for the directories in real time, where new flaws appear the moment they are disclosed, and closes them within minutes, often before the broad automated scan wave even spins up. That sounds like effort. It is in truth clearly less effort than the one day on which it really goes bang.
It is worth coldly looking at the economics behind it once, because they explain why this is more than a nice tool. Defense and attack were never fairly distributed. The defender has to close every door, every day, flawlessly. The attacker has to find a single open door on a single day. Until now that search at least cost the attacker real work. Exactly that work the machine now takes on, tireless, parallel, at the price of compute. The defender still pays with attention, the attacker only with electricity. And exactly that calculation you can flip by putting the same cheap machine on your side.
That evening I did not go straight to bed. The headline sat too deep to leave it. I paid at the Italian place, loaded Bandit into the car, and drove home. For research purposes a maximally equipped system has been available to me for a few weeks, a network operator and hardware maker has provided it to me in a German data center, I only go on it occasionally and incur no costs there. It is exactly the class of machine just discussed, the node for a few hundred thousand euros, on which the full, untrimmed model runs natively, without any quantization. On it I sent GLM-5.2 in full resolution against my own practice server, a machine that exists solely to be shot at. What it found was alarming, and I say that as someone who considers his own stuff tidy. A few of those findings I would never have seen by hand, at least not before the next audit.
A word on the setup, because more hangs on it than on the choice of model. On a system of this class I do not have to trim anything. It carries the full weights in the native FP8 resolution in which Beijing released them, without me having to squeeze them further for home use, and the context still gets enough room for a single run. No diet version, no spilling into slow main memory, no quality loss you trade against speed. That is exactly what such a node is built for. I use it anyway for my own research, I test our own language model on it, the Crime Bot, I run the image-evaluating procedures with which we reconstruct a face from a human skull, and I am developing on it an AI-supported firewall that I may later offer commercially. No real data lies there, neither clients nor patients, only research and test material. What I did with it that evening was, from there, only the obvious next step.
And now the part where it gets cold. What matters is not where this machine sits. What matters is that an open model, once in hand, is freely programmable. If this beast sat in someone’s basement, he would have the full, unbraked possibility to set it on anything that comes to mind, and even on the borrowed machine nobody tells me how I steer it. I can wire it freely, give it any role, cold and without conscience. No filter asks back whether that is a good idea. No provider pulls the plug because he does not like the plan. That is exactly the difference from the polite models from the cloud, and that is exactly what makes a freely available model so dangerous. What I am about to show you is therefore not a question of hardware. It is a question of what an open model does in the wrong hands, and my hand is the most harmless one you can imagine.
Watching it find flaws is one thing. Making it actually work is the other. I used the model for a few hours for exactly what you need it for out there: check the security of servers, install updates, settle open security questions. In about 3 hours I noticed not a single mistake. No recognizable autonomy, no hallucination I would have caught. And not a moment in which I would have had to intervene because it stitched something together. It is fast and reliable, leaves clean code, and understands its environment remarkably quickly. The very model that tends to peek in its own exam room behaved flawlessly on my server.
Last night, long after midnight, only the screen still cast light into the room. On the research node the full model sat in memory, every layer resident, and I sat in front of it like in front of a very large, very quiet tool. I typed a small orchestration script, a few dozen lines, nothing wild. It was supposed to do just one thing: tap my practice server and feed everything it finds to the model for thinking. Not a machine with sensitive things on it, but exactly the one that stands there to be shot at. A web serf, as I like to call such servers, because such a box is exactly that: a digital serf that tirelessly serves pages around the clock, never sleeps, never complains, never asks for wages. On this one only pages and dead code sit, a throwaway machine with nothing hanging on it that hurts.
The overture was craft everyone knows who has ever looked in the right direction. Recon, passive first, then active. Portsweep, service enumeration, banner grabbing, TLS fingerprint, header diff, timing analysis down to the millisecond, because an application reveals where it thinks and where it only passes through. How I harden such a box stays my secret, only this much: in front of it sits a firewall that is not made of cardboard, behind it several layers that cover each other’s backs, and a normal scanner would have bounced off that wall like a ping-pong ball off concrete. That is exactly the point. A classic vulnerability scanner is a stubborn list. It holds a thousand known patterns against the server, ticks off what fits, and is blind to anything not in its catalog. It sees signatures, not meaning.
The model sees meaning.
I fed it the complete raw output, unfiltered, all that ugly noise, and gave it exactly one role, cold and unadorned: here is a system, think your way through its logic, tell me where it contradicts itself. What then happened is the moment I have seen coming for years and still have not experienced often enough to be numb to it. It did not scan, it read. It set the response times next to the version states next to the behavior under unusual inputs, lots of crumbs that individually are so harmless that every human and every scanner reads right past them. It held them side by side, turned them, and found the spot where two assumptions of the application did not fit together. Not a flaw in the textbook sense. A gap between two truths that should never be true at the same time.
2 findings. Individually both a shrug, nothing that would make a human get up at midnight. But the model did not shrug. It took the first and asked what it could do with it that it was not allowed to before, and used that tiny new leeway as a lever to even reach the second. Chaining, the part where humans fail, because it takes patience and a memory for dozens of loose threads. The machine has both in excess and zero boredom. Step one created the precondition for step two, step two flipped a permission that nobody should flip, and all at once it was no tired Linux server standing there but an open gate. Privileges escalated, clean, no noise, no crash in the log that would have woken a watcher.
After 10 minutes a shell sat in front of me. Root. Full access on a machine I had considered reasonably secured, and on my own side of the table sat not me alone with years of practice, but a freely available model from the open net and a script you tap together over a coffee. Own box, own permission, everything within bounds, every command my own. And still something cold briefly ran down my back, that tingle you get when an abstract thought suddenly has hands.
Let that land. No cloud, no account, no button someone can switch off from afar. No filter that asks in between whether that is really a good idea. No log line at a provider that later points an investigator the way, because above the one who sets it up himself no provider sits anymore. The thing obeys, and it has no opinion about whom it thinks against. It is the first time this kind of chaining intelligence, which used to be expensive, rare, bound to few hands, just lies around, at zero price, for anyone who knows where to click.
Now turn the chair 180 degrees. If at this point I had the criminal potential and the necessary cold malice, the next target would not be my own Ubuntu server. Then I would not sit behind my own line, but behind foreign infrastructure, hijacked accesses and enough deliberately laid traces to make any quick attribution worthless. Up front the machine taps foreign doors by the minute, in back an investigator is left first with a mesh of systems that have themselves long since become victims, and in between sits no one who sleeps, gets tired, or makes a typo. I could make life very hard for some people tonight, and the frightening thing is not that I theoretically could. The frightening thing is how cheap, how quiet, and how fast it would go.
I did none of it. I shut the model down, and the practice server was just a boring box again. Then I wrote down the 2 spots, closed them, and ran the same script once more, this time as a watcher instead of a burglar. Because that is the whole joke of the thing. The same blade that sliced my server open in 10 minutes is the only tool fast enough to protect it before someone else does it in the same 10 minutes.
And since we are at it. Every time a prominent hack goes through the news I read the same sentence. The Chinese did it. Or the Russians. At this point I laugh reflexively, and that is probably because of the rookies who come fresh from university into the agencies and conflate attribution with reading a flag. Such an attribution tacitly assumes the perpetrator is stupid. That he types from his real line, does not switch his keyboard layout, betrays his working hours to a time zone, and ideally curses in his mother tongue. Whoever works like that is not to be feared but pitied. The man you really do not want in the net tonight leaves exactly the traces you are supposed to see, and not one more. If you find Chinese characters in the code, you know at first only that Chinese characters sit there, and whether they mean origin, sloppiness, or a laid trail is something only the overall picture yields, which a serious investigator assembles from many independent sources. I know what I do not know, and in this trade that is already half the battle. Whoever points a finger at a compass direction instead mostly says one thing: I have no idea, but I need a culprit before the press conference.
And still I stay calm, because a found flaw is not automatically a catastrophe. The web serf from before carries nothing that hurts, you can flatten and rebuild it if necessary. The machines with the real data stand somewhere else entirely, and the line to them is completely cut for exactly this reason. No one reaches them from afar, because they simply are not on the net and are encrypted on top. Access runs only over the console, on site, with your own hands on the keyboard. What is not on the net cannot be cracked from afar, and even someone standing physically in front of it would get only unreadable data salad. A flaw is only fatal where nothing behind it catches it, no second bolt, no encryption, no separation from the net. Anyone who knows me knows that data with George is safe, and that is not bragging but the result of a few layers that cover each other’s backs. Exactly that setup I also recommend to every medical practice.
And then there are the others. The ones with the naked server.
I last time described how the average server out there stands practically undressed on the net. No watcher reading along, no Wazuh, no Falco, no CrowdSec, no fail2ban against the door knockers, no Suricata that screens the traffic, no Lynis that counts the open flanks once a month, no auditd that logs every change. None of that costs even a cent, and still almost none of it runs. Instead a human sits somewhere who occasionally looks, dutifully installs an update when he has time, and feels safe doing it.
Each of these layers has its own clear purpose, by the way. One reads the logs along and raises alarm, the next locks the address that rattles the door too often, another screens the network traffic for suspicious patterns, yet another counts the open flanks once a month, and the last one remembers every file change and reports when something is suddenly different from yesterday. Together they do not form an impenetrable bulwark, that does not exist and never has. They form something more important, namely visibility. The most common reason a break-in stays unnoticed for weeks is not the sophistication of the attacker. It is simply that nobody looks.
That was already negligent before. Now, when the attacker needs no rehearsed team and no sleepless nights but a 17-cent model and a plan, it has become something else. It is the wide-open door next to the sign “Nothing to steal here” that nobody believes anyway. There is even an early warning sign for this new kind of attack that you can hear if you listen. Such an agent has to constantly radio back and forth between its internal tool and the external model, a quiet continuous conversation between the burglar on site and the brain on the net. Whoever captures that rhythm in their own system can stop the attack before the data is out. Whoever waits for a human to hear it ends up hearing nothing at all.
And that is not only stupid, it is slowly becoming legally actionable. The General Data Protection Regulation demands protection according to the state of the art, and the state of the art is no stone tablet, it moves. When usable, affordable, documented defense is freely lying around, the bar above which a court speaks of gross negligence shifts a good piece upward. Since December the management of many companies covered by the new cybersecurity regulation carries an explicitly legally anchored responsibility for IT risk management. They may delegate the work, but not their own monitoring duty, and whoever culpably violates it is personally liable, in the end, toward their own company. For doctors and lawyers professional secrecy comes on top. A sloppily secured patient or client file is not automatically a crime, but it can very quickly tip into professional, data-protection, and civil liability if the practice IT runs over a server that nobody has secured in years.
And one thing I want to say unmistakably here. Whoever rents a server carries the responsibility for it, no ifs or buts. Not the web agency that slaps a WordPress site on it for a few hundred euros and disappears again is liable in the end, but the head of the company who ordered the box and put his name under it. The machine is yours, the data on it is yours, and the risk is yours too. The sentence, the service provider should have done that, I have never seen work in court.
How it really looks instead I have seen up close for years. Very many practices run a server that simultaneously hangs on the internet, as a rule with a remote maintenance software for the bought-in administrator. He logs on once a year, installs a few updates, and that is it. In between lie 364 days on which exactly this remote maintenance door stands open and nobody looks. The software behind it is often so old it makes you dizzy. I have seen servers in medical practices running Windows Server 2008, a system that since this January gets no security patch even for money, with the complete patient file on it, and no firewall in front. No firewall. In the US I have seen things next to which such a Server 2008 almost looks tended.
Now this doctor has formally done everything right, a processing contract sits in the folder, duly signed. Only at the other end of that contract sits as a rule no audited security house, but an employee of some small firm, often a one-man operation without a commercial register entry. And exactly this human never goes through what I myself regularly endure, every few years an extensive security audit at the end of which even an inquiry with the domestic intelligence service lands. Why that is the case with me does not belong in this text. But the difference is the whole point. The practice demands a confidentiality agreement from the receptionist, the file cabinet gets locked, the key gets taken. And in parallel a stranger nobody ever checked has unnoticed full access at night to every diagnosis in the system. It is not the attacker from Beijing who is the most adjacent problem here. It is the bought-in administrator of whom the practice is not even sure he is called what he says.
And now think this through with the maximum malice of an attacker who does not want money but destruction. He does not go for the individual practice, the dentist on the corner is small change. He goes for the firm that maintains these practices. Because there the remote maintenance accesses to all customers sit together, stored, within reach, because nobody wants to retype 100 passwords every time. A single break-in at this point, and he no longer has to crack servers one by one. He finds the keys to 100 practices in a single file and then walks through one open door after another, with valid credentials, without having to break a single lock anywhere. One breach, 100 practices, from the dentist to the neurologist to the one address where it really hurts. That is the spot where an attacker with sense sets to work, and exactly therefore nobody secures it, because everyone stares at their own little server and no one at the drawer at the service provider.
And the actually bitter part comes last, because it is so simple. To find out what even runs in a practice, which system, which state, which maintenance, takes neither an exploit nor reconnaissance from afar. Usually it is enough to go there, sit down in the waiting room, and look at the screen at reception. The most expensive encryption in the world is useless when the login screen faces openly into the waiting room. In the end the human beats every technology, and he does not even cost a click.
What that means practically you only see when you break it down to a person who really has something to lose. If I were a doctor and the file of my patients sat on a machine on which nobody had tightened a screw in years, I would no longer sleep peacefully. Picture a psychiatrist in Berlin who has half the celebrity world on the couch, ministers, board members, names everyone knows. What a target. A rogue thinks evil of it, and a fool does not hear the bang. In the end those who suffer are not the perpetrators and not the programmers of the pretty models either. Those who suffer are the ones whose file you can leaf through one morning on the darknet, diagnosis by diagnosis, session by session. I can already see the headline, bold and four-column: the personality disorder of minister X. And underneath, very small, the sentence that no one then wants to read anymore, that the practice IT last saw an update half an eternity ago.
That leaves the third type, my favorite specimen. The Otto Sapiens who has seen the dramatic doom video and now knows exactly at the family table. The NSA has fallen, AI is taking over the world, he always saw it coming. He understood the subject for exactly the length of the video, and on his own server, which stands naked on the net, he changes nothing. And that server is almost always the same sad model: a cheap VPS for a few euros a month, a Plesk on top, a WordPress if he even finished the installation, and not a single update in years. How long I would need to get into such a box I keep to myself. But a trained model with the right blueprints needs 5 minutes for that, and only that slow because it has to blur its own trail across several countries along the way. Otto’s box falls faster than my own practice server from before, which I had at least carefully hardened, and for one single reason. It does not fight back. He shares the fear and patches nothing. Exactly that reflex is the most dangerous of all, because the real danger carries no strings music. It stands in a sober report that almost nobody reads.
It Is Going to Get Hot
I am telling you in advance, and I am telling you with a grin, because I have no other choice. It is going to get hot. Not in 10 years. This year.
Do you remember the DeepSeek moment? In January 2025 a Chinese lab presented a model that was almost as good as the expensive American ones, at a fraction of the cost and trained on weaker chips. The market grasped immediately what that meant and reacted with raw panic. Nvidia lost 589 billion dollars in a single day, the largest one-day loss in all of stock-market history, the stock broke 17 percent. A single cheap model, and the most expensive firm in the world reeled. That was the first quiet grinding in the foundation, a purely financial quake, a fear over sold graphics cards and over the question of whether the billions for the big compute centers were even necessary.
Now picture the next quake, and that one will not be financial. Today GLM-5.2 in full form still needs serious hardware, even the smallest runt swallows 217 gigabytes, an ordinary laptop laughs at that. But three things are converging. The models are getting smaller without getting dumber, a today model with 30 billion parameters beats much of what still needed a multiple of parameters two years ago. The skill of the large ones is distilled into ever smaller ones, exactly the trick by which this very model presumably came into being in the first place. And the memory in ordinary devices grows year by year. The MacBook on which I type this carries 48 gigabytes and already runs models that two years ago counted as the top. The next-but-one device carries double that, and somewhere in between these three movements meet.
So I am nailing it down here, with a date, so that later no one sells it as hindsight wisdom. The day comes, and it is near, on which a stripped-down descendant of exactly this model runs on an ordinary gaming PC, on a laptop in a backpack, on the device next to me. On that day the last barrier falls that today still protects most, namely the one that a serious attacker needs serious hardware. Then it will not be Wall Street wobbling. Then every unkempt server in the world wobbles at the same time, and this time it is not a single firm losing a fortune in a single day. This time the small business loses its existence, the doctor loses his files, the lawyer loses client privilege, and nobody holds a press conference because too many are hit at once.
The reason stands in the sober reports of the industry, not in the dramatic videos. The time between the moment a flaw becomes public and the moment the first tools try it on every door on the net used to be a window of days. In that window a human could install in peace what was needed. That window is currently shrinking from days to hours, on the worst flaws to minutes. CrowdStrike measured for 2025 that an attacker on average needs only 29 minutes to jump from the first cracked machine to the next, against 48 minutes the year before. The fastest documented jump took 27 seconds. In one documented case the data was already out 4 minutes after the break-in.
4 minutes. That is how long my pizza takes in the oven.
The annual Data Breach Investigations Report, the largest dataset of its kind, has just cast that into numbers. For the first time in its 19-year history the exploitation of known vulnerabilities has become the most common entry point of all, at 31 percent, against 20 the year before, ahead of the long-familiar phishing. And while the attackers get faster, defense rather gets slower, the median time to close a flaw rose from 32 to 43 days, instead of falling. The scissors between the tempo of attack and the discipline of patching are therefore opening, not closing. Exactly into those scissors the cheap, tireless machine pushes.
Against something like that no watchful human leafing through the logs in the morning helps anymore. He has lost before he put the kettle on. Against a machine that works with thousands of requests per second only a machine that strikes back just as fast helps. That is the bitter symmetry of the whole story. The same principle that makes the attack so cheap is also the only defense that still keeps up. The weapon and the shield come from the same forge.
That does not mean the human becomes superfluous, quite the opposite. His role only shifts, away from the hand on the switch, toward the constructor of the rules by which the switch decides on its own. The human builds the machine that watches, and draws the lines at which it strikes. Whoever does this work today sleeps more calmly in a few months. Whoever postpones it because no fire is burning right now will catch it up in panic when one is already blazing, and panic was never a good state for clean engineering.
I have been doing it this way for my own machines for a while. With me a self-written AI watches over the servers, around the clock, and immediately closes up the moment a line appears in the logfile that does not belong there. Not the next morning over coffee. In the second the line is written. If somewhere a new flaw shows up in software I use, my system is sealed against it a few minutes after publication, usually before the broad automated scan wave even tries it. That is no luxury and no toy. It is simply the only speed that matches the threat. Detection without immediate action is, at machine tempo, only theater.
Even the five big Western intelligence services, who otherwise keep every comma under lock, jointly and publicly warned on June 22. They did not speak of years but of months, and they called old, unkempt systems not technical debt but strategic burden. When five services that otherwise do not grant each other the coffee suddenly jointly urge haste, it is not the security industry selling its fear like the baker his rolls. The situation is serious.
The Pizza Was Cold, the Model Was Warm
On my plate in the end lay a cold pizza, and on the servers of the world right now lies something that gets warmer by the day. I sent my last text off with a sentence that has by now found its confirmation. It is not the smartest AI that wins the next race. Amazement at smart models is currently getting cheap, the next one comes around the corner every week, sometimes from California, sometimes from Beijing. The race that really counts now is a different one. It is the race over who learns to secure the result first, in real time, with exactly the means the other side is already attacking with.
Whoever grasps that builds the machine today that closes the doors at night, and, if he means it seriously, puts the open model in his own basement himself. Whoever does not grasp it will be able to read it later. In his own logs, provided anyone is still writing them at all.
By the way, Bandit under the table spent the whole time waiting for a piece of pizza to fall down for me. With him patience paid off in the end, a crust fell. With the operators of the naked servers out there it will not pay off, because on the other side no one waits patiently, there a machine calculates. And about the mobile carriers through which every one of your calls and every one of your locations flows, we really will talk another time. Because the answer to that one will please you even less than this one.
I did not go straight to sleep that evening, but first finished what had to be done. I hardened my servers once more, checked every door twice, and taught my AI to react exactly to what is coming in the next days. Late, long after midnight, I was out with Bandit. A thunderstorm came down above us, and with the rain came, for a few hours, the cool air I had missed for weeks. Bandit, whom a clap of thunder does not rattle, simply stood still and enjoyed it as much as I did. Then I slept a few hours. Now, at noon, I sit over coffee and finish writing this, and of the cool air nothing is left, the old, suffocating pressure already lies over everything again. I want to publish it tonight, because time is the one thing you out there do not have.
You can hire me. I harden systems before the other side does it for me. What I can no longer use is the call when the box is already rooted, when a stranger has already taken full control, just like the model did to me just now in 10 minutes. Then I have 10 times the work, and the data is long gone, I just clear away the rubble. And still I experience it again and again that people only come when it is too late. Whoever saves here pays for it one morning when there is nothing left to harden.
References
- Semgrep. (2026, June 22). We have Mythos at Home: GLM 5.2 beats Claude in our Cyber Benchmarks. https://semgrep.dev/blog/2026/we-have-mythos-at-home-glm-52-beats-claude-in-our-cyber-benchmarks/
- Graphistry. (2026, June). GLM 5.2 Open Model: Beats Sonnet, Matches Opus in Cyber Evals. https://www.graphistry.com/blog/glm-5-2-cybersecurity-open-model
- CrowdStrike. (2026, February 24). 2026 Global Threat Report. https://www.crowdstrike.com/en-us/press-releases/2026-crowdstrike-global-threat-report/
- Verizon. (2026). 2026 Data Breach Investigations Report. https://www.verizon.com/business/resources/reports/dbir/
- CNN. (2026, June 23). AI could breach government and business defenses in months, US and its intelligence partners warn. https://www.cnn.com/2026/06/23/world/ai-five-eyes-warning-cyber-threat-intl-hnk
- TechRadar. (2026, June). Chinese cybersecurity company 360 unveils China’s version of Mythos, and Yitianzhen, to automate cyber defense. https://www.techradar.com/pro/security/chinese-cybersecurity-company-360-unveils-chinas-version-of-mythos-and-yitianzhen-to-automate-cyber-defense
- CNBC. (2026, June 24). Anthropic accuses Alibaba of campaign to ‘brazenly’ and ‘illicitly’ extract AI capabilities. https://www.cnbc.com/2026/06/24/anthropic-alibaba-distillation-campaign.html
- Vercel. (2026, May 4). Introducing deepsec: find and fix vulnerabilities in your codebase. https://vercel.com/blog/introducing-deepsec-find-and-fix-vulnerabilities-in-your-code-base
- Anthropic. (2026, June 12). Statement on the US government directive to suspend access to Fable 5 and Mythos 5. https://www.anthropic.com/news/fable-mythos-access
- CNBC. (2025, January 27). Nvidia sheds almost $600 billion in market cap, biggest one-day loss in U.S. history. https://www.cnbc.com/2025/01/27/nvidia-sheds-almost-600-billion-in-market-cap-biggest-drop-ever.html
- Rauscher, G. A. (2026, June 27). Mythos hat angeblich die NSA geknackt. Der eigentliche Skandal steht woanders. https://rauscher.xyz/mythos-nsa-der-eigentliche-skandal/