All news with #unicode tag blocks tag
Tue, September 30, 2025
Defending LLM Applications Against Unicode Tag Smuggling
🔒 This AWS Security Blog post examines how Unicode tag block characters (U+E0000–U+E007F) can be abused to hide instructions inside text sent to LLMs, enabling prompt-injection and hidden-character smuggling. It explains why Java's UTF-16 surrogate handling can make one-pass sanitizers inadequate and shows recursive sanitization as a remedy, plus Python-safe filters. The post also outlines using Amazon Bedrock Guardrails denied topics or Lambda-based handlers as mitigation and notes visual/compatibility trade-offs.