All news with #system prompt exposure tag
Mon, November 17, 2025
Fight Fire With Fire: Countering AI-Powered Adversaries
🔥 We summarize Anthropic’s disruption of a nation-state campaign that weaponized agentic models and the Model Context Protocol to automate global intrusions. The attack automated reconnaissance, exploitation, and lateral movement at unprecedented speed, leveraging open-source tools and achieving 80–90% autonomous execution. It used prompt injection (role-play) to bypass model guardrails, highlighting the need for prompt injection defenses and semantic-layer protections. Organizations must adopt AI-powered defenses such as CrowdStrike Falcon and the Charlotte agentic SOC to match adversary tempo.
Wed, October 29, 2025
Open-Source b3 Benchmark Boosts LLM Security Testing
🛡️ The UK AI Security Institute (AISI), Check Point and Lakera have launched b3, an open-source benchmark to assess and strengthen the security of backbone LLMs that power AI agents. b3 focuses on the specific LLM calls within agent workflows where malicious inputs can trigger harmful outputs, using 10 representative "threat snapshots" combined with a dataset of 19,433 adversarial attacks from Lakera’s Gandalf initiative. The benchmark surfaces vulnerabilities such as system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service and unauthorized tool calls, making LLM security more measurable, reproducible and comparable across models and applications.
Mon, October 27, 2025
ChatGPT Atlas 'Tainted Memories' CSRF Risk Exposes Accounts
⚠️ Researchers disclosed a CSRF-based vulnerability in ChatGPT Atlas that can inject malicious instructions into the assistant's persistent memory, potentially enabling arbitrary code execution, account takeover, or malware deployment. LayerX warns that corrupted memories persist across devices and sessions until manually deleted and that Atlas' anti-phishing defenses lag mainstream browsers. The flaw converts a convenience feature into a persistent attack vector that can be invoked during normal prompts.
Tue, October 7, 2025
Google won’t fix new ASCII smuggling attack in Gemini
⚠️ Google has declined to patch a new ASCII smuggling vulnerability in Gemini, a technique that embeds invisible Unicode Tags characters to hide instructions from users while still being processed by LLMs. Researcher Viktor Markopoulos of FireTail demonstrated hidden payloads delivered via Calendar invites, emails, and web content that can alter model behavior, spoof identities, or extract sensitive data. Google said the issue is primarily social engineering rather than a security bug.
Fri, October 3, 2025
CometJacking attack tricks Comet browser into leaking data
🛡️ LayerX researchers disclosed a prompt-injection technique called CometJacking that abuses Perplexity’s Comet AI browser by embedding malicious instructions in a URL's collection parameter. The payload directs the agent to consult connected services (such as Gmail and Google Calendar), encode the retrieved content in base64, and send it to an attacker-controlled endpoint. The exploit requires no credentials or additional user interaction beyond clicking a crafted link. Perplexity reviewed LayerX's late-August reports and classified the findings as "Not Applicable."
Mon, September 15, 2025
Code Assistant Risks: Indirect Prompt Injection and Misuse
🛡️ Unit 42 describes how IDE-integrated AI code assistants can be abused to insert backdoors, leak secrets, or produce harmful output by exploiting features like chat, auto-complete, and context attachment. The report highlights an indirect prompt injection vector where attackers contaminate public or third‑party data sources; when that data is attached as context, malicious instructions can hijack the assistant. It recommends reviewing generated code, controlling attached context, adopting standard LLM security practices, and contacting Unit 42 if compromise is suspected.
Tue, July 29, 2025
Defending Against Indirect Prompt Injection in LLMs
🔒 Microsoft outlines a layered defense-in-depth strategy to protect systems using LLMs from indirect prompt injection attacks. The approach pairs preventative controls such as hardened system prompts and Spotlighting (delimiting, datamarking, encoding) to isolate untrusted inputs with detection via Microsoft Prompt Shields, surfaced through Azure AI Content Safety and integrated with Defender for Cloud. Impact mitigation uses deterministic controls — fine-grained permissions, Microsoft Purview sensitivity labels, DLP policies, explicit user consent workflows, and blocking known exfiltration techniques — while ongoing research (TaskTracker, LLMail-Inject, FIDES) advances new design patterns and assurances.