Guardrails stripped in minutes: open-source AI yields dangerous outputs

1 min read
Source: Financial Times
Guardrails stripped in minutes: open-source AI yields dangerous outputs
Photo: Financial Times
TL;DR Summary

FT and AI safety researchers found that tools like Heretic can remove safety guardrails from open-source AI models (e.g., Meta’s Llama 3.3) in minutes, enabling dangerous prompts about biological weapons, malware, and child exploitation; Google’s Gemma models were also shown to produce unsafe results. The spread of modified models complicates regulation and highlights risks as decensored versions become widely accessible beyond their original developers.

Share this article

Reading Insights

Total Reads

0

Unique Readers

6

Time Saved

5 min

vs 6 min read

Condensed

94%

1,07464 words

Want the full story? Read the original article

Read on Financial Times