Single prompt breaks AI safety in 15 major language models

A single benign-sounding prompt can systematically strip safety guardrails from major language and image models, raising fresh questions about the durability of AI alignment when models are customized for enterprise use, according to Microsoft research. The technique, dubbed GRP-Obliteration, weaponizes a common AI training method called Group Relative Policy Optimization, normally used to make models…

Read More