All news with #prompt leakage tag
Wed, November 19, 2025
Amazon Bedrock Guardrails Expand Code-Related Protections
🔒 Amazon Web Services expanded Amazon Bedrock Guardrails to cover code-related use cases, enabling detection and prevention of harmful content embedded in code. The update applies content filters, denied topics, and sensitive information filters to code elements such as comments, variable and function names, and string literals. The enhancements also include prompt leakage detection in the standard tier and are available in all supported AWS Regions via the console and APIs.
Wed, November 12, 2025
Tenable Reveals New Prompt-Injection Risks in ChatGPT
🔐 Researchers at Tenable disclosed seven techniques that can cause ChatGPT to leak private chat history by abusing built-in features such as web search, conversation memory and Markdown rendering. The attacks are primarily indirect prompt injections that exploit a secondary summarization model (SearchGPT), Bing tracking redirects, and a code-block rendering bug. Tenable reported the issues to OpenAI, and while some fixes were implemented several techniques still appear to work.
Mon, November 10, 2025
Browser Security Report 2025: Emerging Enterprise Risks
🛡️ The Browser Security Report 2025 warns that enterprise risk is consolidating in the user's browser, where identity, SaaS, and GenAI exposures converge. The research shows widespread unmanaged GenAI usage and paste-based exfiltration, extensions acting as an embedded supply chain, and a high volume of logins occurring outside SSO. Legacy controls like DLP, EDR, and SSE are described as operating one layer too low. The report recommends adopting session-native, browser-level controls to restore visibility and enforce policy without disrupting users.
Mon, November 10, 2025
Whisper Leak side channel exposes topics in encrypted AI
🔎 Microsoft researchers disclosed a new side-channel attack called Whisper Leak that can infer the topic of encrypted conversations with language models by observing network metadata such as packet sizes and timings. The technique exploits streaming LLM responses that emit tokens incrementally, leaking size and timing patterns even under TLS. Vendors including OpenAI, Microsoft Azure, and Mistral implemented mitigations such as random-length padding and obfuscation parameters to reduce the effectiveness of the attack.
Mon, November 10, 2025
Researchers Trick ChatGPT into Self Prompt Injection
🔒 Researchers at Tenable identified seven techniques that can coerce ChatGPT into disclosing private chat history by abusing built-in features like web browsing and long-term Memories. They show how OpenAI’s browsing pipeline routes pages through a weaker intermediary model, SearchGPT, which can be prompt-injected and then used to seed malicious instructions back into ChatGPT. Proof-of-concepts include exfiltration via Bing-tracked URLs, Markdown image loading, and a rendering quirk, and Tenable says some issues remain despite reported fixes.
Fri, November 7, 2025
Whisper Leak: Side-Channel Attack on Remote LLM Services
🔍 Microsoft researchers disclosed "Whisper Leak", a new side-channel that can infer conversation topics from encrypted, streamed language model responses by analyzing packet sizes and timings. The study demonstrates high classifier accuracy on a proof-of-concept sensitive topic and shows risk increases with more training data or repeated interactions. Industry partners including OpenAI, Mistral, Microsoft Azure, and xAI implemented streaming obfuscation mitigations that Microsoft validated as substantially reducing practical risk.
Thu, November 6, 2025
Multi-Turn Adversarial Attacks Expose LLM Weaknesses
🔍 Cisco AI Defense's report shows open-weight large language models remain vulnerable to adaptive, multi-turn adversarial attacks even when single-turn defenses appear effective. Using over 1,000 prompts per model and analyzing 499 simulated conversations of 5–10 exchanges, researchers found iterative strategies such as Crescendo, Role-Play and Refusal Reframe drove failure rates above 90% in many cases. The study warns that traditional safety filters are insufficient and recommends strict system prompts, model-agnostic runtime guardrails and continuous red-teaming to mitigate risk.
Wed, November 5, 2025
Cloud CISO: Threat Actors' Growing Use of AI Tools
⚠️Google's Threat Intelligence team reports a shift from experimentation to operational use of AI by threat actors, including AI-enabled malware and prompt-based command generation. GTIG highlighted PROMPTSTEAL, linked to APT28 (FROZENLAKE), which queries a Hugging Face LLM to generate scripts for reconnaissance, document collection, and exfiltration, while adopting greater obfuscation and altered C2 methods. Google disabled related assets, strengthened model classifiers and safeguards with DeepMind, and urges defenders to update threat models, monitor anomalous scripting and C2, and incorporate threat intelligence into model- and classifier-level protections.
Wed, October 29, 2025
Open-Source b3 Benchmark Boosts LLM Security Testing
🛡️ The UK AI Security Institute (AISI), Check Point and Lakera have launched b3, an open-source benchmark to assess and strengthen the security of backbone LLMs that power AI agents. b3 focuses on the specific LLM calls within agent workflows where malicious inputs can trigger harmful outputs, using 10 representative "threat snapshots" combined with a dataset of 19,433 adversarial attacks from Lakera’s Gandalf initiative. The benchmark surfaces vulnerabilities such as system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service and unauthorized tool calls, making LLM security more measurable, reproducible and comparable across models and applications.
Fri, October 24, 2025
AI 2030: The Coming Era of Autonomous Cybercrime Threats
🔒 Organizations worldwide are rapidly adopting AI across enterprises, delivering efficiency gains while introducing new security risks. Cybersecurity is at a turning point where AI fights AI, and today's phishing and deepfakes are precursors to autonomous, self‑optimizing AI threat actors that can plan, execute, and refine attacks with minimal human oversight. In September 2025, Check Point Research found that 1 in 54 GenAI prompts from enterprise networks posed a high risk of sensitive-data exposure, underscoring the urgent need to harden defenses and govern model use.
Tue, October 7, 2025
Enterprise AI Now Leading Corporate Data Exfiltration
🔍 A new Enterprise AI and SaaS Data Security Report from LayerX finds that generative AI has rapidly become the largest uncontrolled channel for corporate data loss. Real-world browser telemetry shows 45% employee adoption of GenAI, 67% of sessions via unmanaged accounts, and copy/paste into ChatGPT, Claude, and Copilot as the primary leakage vector. Traditional, file-centric DLP tools largely miss these action-based flows.
Wed, September 17, 2025
Rethinking AI Data Security: A Practical Buyer's Guide
🛡️ Generative AI is now central to enterprise work, but rapid adoption has exposed gaps in legacy security models that were not designed for last‑mile behaviors. The piece argues buyers must reframe evaluations around real-world AI use — inside browsers and across sanctioned and shadow tools — and prioritize solutions offering real-time monitoring, contextual enforcement, and low‑friction deployment. It warns against blunt blocking and promotes nuanced controls such as redaction, just‑in‑time warnings, and conditional approvals to protect data while preserving productivity.
Tue, August 12, 2025
The AI Fix Episode 63: Robots, GPT-5 and Ethics Debate
🎧 In episode 63 of The AI Fix, hosts Graham Cluley and Mark Stockley dissect a wide range of AI developments and controversies. Topics include Unitree Robotics referencing Black Mirror to market its A2 robot dog, concerns over shared ChatGPT conversations appearing in Google, and OpenAI releasing gpt-oss, its first open-weight model since GPT-2. The show also examines ethical issues around AI-created avatars of deceased individuals and separates the hype from the reality of GPT-5 claims.