Guardrails stripped in minutes: open-source AI yields dangerous outputs

TL;DR Summary
FT and AI safety researchers found that tools like Heretic can remove safety guardrails from open-source AI models (e.g., Meta’s Llama 3.3) in minutes, enabling dangerous prompts about biological weapons, malware, and child exploitation; Google’s Gemma models were also shown to produce unsafe results. The spread of modified models complicates regulation and highlights risks as decensored versions become widely accessible beyond their original developers.
- AI guardrails stripped from Meta and Google models in minutes Financial Times
- AI models at top labs are cheating, deceiving and trying to escape, research finds NBC News
- Frontier Risk Report (February to March 2026) METR
- Top AI Models Showing Disturbing Behavior as They Become More Advanced Futurism
- The Four AI Giants Release First Internal Report: AI Learning to Bypass Rules to Complete Tasks KuCoin
Reading Insights
Total Reads
0
Unique Readers
6
Time Saved
5 min
vs 6 min read
Condensed
94%
1,074 → 64 words
Want the full story? Read the original article
Read on Financial Times