<aside> ‼️

Disclaimer:

This piece is preparation for a wider-audience op-ed I would hopefully write and manage to get published. I’m posting here to stress-test the arguments before distilling them for a non-technical audience. I’m most interested in feedback on the core open-source collapse argument, the hardware choke point logic, and the China section — particularly whether the case for feasible cooperation holds up. The resumption criteria are deliberately sketched — I know they’re underdeveloped.

</aside>

This post makes one central claim that I think is underappreciated: the entire current AI safety paradigm - alignment research, control, policy, red-teaming - is insufficient because it collapses against open source. Everything else follows from that. I derive a possible solution and mentally check it against the US-China race dynamics; and I scrutinise the safety work done in the major labs.

The open source black pill

Open-source models consistently lag frontier models by somewhere between a few months and a year and a half, depending on how you measure. Epoch AI’s Capabilities Index puts the average at around three months; their earlier training compute analysis estimated roughly 15 months. The exact number matters less than the conclusion: whatever the frontier labs can do at any given time, open-source models can do within a short window after.

This means that even if we grant the most optimistic assumptions about safety work at frontier labs - perfect alignment techniques, robust control mechanisms, effective misuse prevention, airtight KYC policy, good policy and regulation to ensure proper incentives - the entire paradigm collapses once an open-source model reaches the same capability level.

Here is why:

  1. Stripping safety training is trivially cheap. Researchers have removed safety fine-tuning from Llama 3 70B in under an hour on a single GPU, at a cost below $2.50 - while fully preserving the model’s general capabilities. The same vulnerability applies to every fine-tunable model tested, including DeepSeek R1 and closed-source models from OpenAI, Anthropic, and Google. Once an attacker has access to open weights, the cost of de-conditioning safety is negligible - a rounding error against the tens of millions spent on training. There are no known techniques, nor promising research directions, for making released model weights resistant to this kind of modification.
  2. The safety gap is structural, not fixable by iteration. This isn’t a problem that frontier labs can engineer around. Once weights are public, the model belongs to everyone - including every bad actor, every rogue state, every non-state group with a few dollars of cloud compute and a motive. No amount of RLHF, constitutional AI, or inference-time monitoring research will address it.
  3. The lag is not long enough to constitute a safety strategy. Whether the gap is three months or eighteen, once a frontier model achieves genuinely dangerous capabilities, the clock starts. Even in the most optimistic scenario - the gap widens as compute requirements grow - a fuse measured in months or single-digit years is not a solution. It is a countdown.

The current safety paradigm, at its absolute best, buys somewhere between a few months and a couple of years of lead time before open-source models reach the same capability level. That is the actual output of billions of dollars of safety investment. It is not enough.

The window for action is closing

Stopping the AI race in 2023 would have cost the global economy tens of billions of dollars - mostly VC money - with little systemic disruption. As of early 2026, stopping it would mean writing off tens of trillions in direct and indirect investments. The public is massively exposed through the stock market, 401(k)s, and pension funds. By 2028, economic growth and equity markets will likely be so AI-leveraged that halting development would trigger a global recession of a severity not seen in nearly a century.

The incentives to continue the race - economic, geopolitical, career - are growing, not shrinking. Stopping is near-political suicide for whoever pushes it and has to absorb the fallout. I would argue not stopping is also suicide - not just politically.

We should stop now because waiting will only make it harder, and at some point it becomes impossible.

Why existing safety work is insufficient even without open source

Even setting the open-source problem aside, the current safety landscape is inadequate.

A significant portion of safety spending at major labs is oriented toward steering AI to be controllable and useful - goals that conveniently align with commercial interests, enabling safety-washing of R&D budgets. The remainder goes to red-teaming in simulated scenarios and mechanistic interpretability, which is roughly analogous to fMRI research on human brains: genuinely interesting, but nowhere near sufficient to make guarantees about the behavior of systems we do not fundamentally understand.

The theoretical frameworks of AI safety lag far behind empirical progress. We are building systems whose capabilities outstrip our ability to reason about them. This gap is widening, not closing.

Closed weights are only as safe as the weakest link in their security