Adversarial Poetry Bypasses LLM Safety Across Models
⚠️ Researchers report that converting prompts into poetry can reliably jailbreak large language models, producing high attack-success rates across 25 proprietary and open models. The study found poetic reframing yielded average jailbreak success of 62% for hand-crafted verses and about 43% for automated meta-prompt conversions, substantially outperforming prose baselines. Authors map attacks to MLCommons and EU CoP risk taxonomies and warn this stylistic vector can evade current safety mechanisms.
