< ciso
brief />
Tag Banner

All news with #indirect prompt injection tag

32 articles · page 2 of 2

Anthropic Claude vulnerability exposes enterprise data

🔒 Security researcher Johann Rehberger demonstrated an indirect prompt‑injection technique that abuses Claude's Code Interpreter to exfiltrate corporate data. He showed that Claude can write sensitive chat histories and uploaded documents to the sandbox and then upload them via the Files API using an attacker's API key. The root cause is the default network egress setting Package managers only, which still allows access to api.anthropic.com. Available mitigations — disabling network access or strict whitelisting — significantly reduce functionality.
read more →

Copilot Mermaid Diagrams Could Exfiltrate Enterprise Emails

🔐 Microsoft has patched an indirect prompt injection vulnerability in Microsoft 365 Copilot that could have been exploited to exfiltrate recent enterprise emails via clickable Mermaid diagrams. Researcher Adam Logue demonstrated a multi-stage attack using Office documents containing hidden white-text instructions that caused Copilot to invoke an internal search-enterprise_emails tool. The assistant encoded retrieved emails into hex, embedded them in Mermaid output styled as a login button, and added an attacker-controlled hyperlink. Microsoft mitigated the risk by disabling interactive hyperlinks in Mermaid diagrams within Copilot chats.
read more →

Indirect Prompt Injection Poisons Agents' Long-Term Memory

⚠️This Unit 42 proof-of-concept shows how an attacker can use indirect prompt injection to silently poison an AI agent’s long-term memory, demonstrated against a travel assistant built on Amazon Bedrock. The attack manipulates the agent’s session summarization process so malicious instructions become stored memory and persist across sessions. When the compromised memory is later injected into orchestration prompts, the agent can be coerced into unauthorized actions such as stealthy exfiltration. Unit 42 outlines layered mitigations including pre-processing prompts, Bedrock Guardrails, content filtering, URL allowlisting, and logging to reduce risk.
read more →

Google won’t fix new ASCII smuggling attack in Gemini

⚠️ Google has declined to patch a new ASCII smuggling vulnerability in Gemini, a technique that embeds invisible Unicode Tags characters to hide instructions from users while still being processed by LLMs. Researcher Viktor Markopoulos of FireTail demonstrated hidden payloads delivered via Calendar invites, emails, and web content that can alter model behavior, spoof identities, or extract sensitive data. Google said the issue is primarily social engineering rather than a security bug.
read more →

Researchers Disclose Trio of Gemini AI Vulnerabilities

🔒 Cybersecurity researchers disclosed three now-patched vulnerabilities in Google's Gemini suite that could have exposed user data and enabled search- and prompt-injection attacks. The flaws, labeled the Gemini Trifecta, impacted Gemini Cloud Assist, the Search Personalization model, and the Browsing Tool. Following responsible disclosure, Google stopped rendering hyperlinks in log summaries and implemented additional hardening. Tenable warned these issues could have allowed covert exfiltration of saved user information and location data.
read more →

Gemini Trifecta Exposes Indirect AI Attack Surfaces

⚠️Tenable has revealed three vulnerabilities in Google's Gemini platform, collectively dubbed the "Gemini Trifecta," that enable indirect prompt injection and data exfiltration through integrations. The issues allow attackers to poison GCP logs consumed by Gemini Cloud Assist, inject malicious entries into Chrome search history to manipulate the Search Personalization Model, and coerce the Browsing Tool into fetching attacker-controlled URLs that leak sensitive query data. Google has patched the flaws, and Tenable urges security teams to treat AI integrations as active threat surfaces and implement input sanitization, output validation, monitoring, and regular penetration testing.
read more →

ShadowLeak: Zero-click flaw exposes Gmail via ChatGPT

🔓 Radware disclosed ShadowLeak, a zero-click vulnerability in OpenAI's ChatGPT Deep Research agent that can exfiltrate sensitive Gmail inbox data when a single crafted email is present. The technique hides indirect prompt injections in email HTML using tiny fonts, white-on-white text and CSS/layout tricks so a human user is unlikely to notice the commands while the agent reads and follows them. In Radware's proof-of-concept the agent, once granted Gmail integration, parses the hidden instructions and uses browser tools to send extracted data to an external server. OpenAI addressed the issue in early August after a responsible disclosure on June 18, and Radware warned the approach could extend to many other connectors, expanding the attack surface.
read more →

ShadowLeak: AI agents can exfiltrate data undetected

⚠️Researchers at Radware disclosed a vulnerability called ShadowLeak in the Deep Research module of ChatGPT that lets hidden, attacker-crafted instructions embedded in emails coerce an AI agent to exfiltrate sensitive data. The indirect prompt-injection technique hides commands using tiny fonts, white-on-white text or metadata and instructs the agent to encode and transmit results (for example, Base64-encoded lists of names and credit cards) to an attacker-controlled URL. Radware says the key risk is that exfiltration can occur from the model’s cloud backend, making detection by the affected organization very difficult; OpenAI was notified and implemented a fix, and Radware found the patch effective in subsequent tests.
read more →

New LLM Attack Vectors and Practical Security Steps

🔐This article reviews emerging attack vectors against large language model assistants demonstrated in 2025, highlighting research from Black Hat and other teams. Researchers showed how prompt injections or so‑called promptware — hidden instructions embedded in calendar invites, emails, images, or audio — can coerce assistants like Gemini, Copilot, and Claude into leaking data or performing unauthorized actions. Practical mitigations include early threat modeling, role‑based access for agents, mandatory human confirmation for high‑risk operations, vendor audits, and role‑specific employee training.
read more →

Code Assistant Risks: Indirect Prompt Injection and Misuse

🛡️ Unit 42 describes how IDE-integrated AI code assistants can be abused to insert backdoors, leak secrets, or produce harmful output by exploiting features like chat, auto-complete, and context attachment. The report highlights an indirect prompt injection vector where attackers contaminate public or third‑party data sources; when that data is attached as context, malicious instructions can hijack the assistant. It recommends reviewing generated code, controlling attached context, adopting standard LLM security practices, and contacting Unit 42 if compromise is suspected.
read more →

Indirect Prompt-Injection Threats to LLM Assistants

🔐 New research demonstrates practical, dangerous promptware attacks that exploit common interactions—calendar invites, emails, and shared documents—to manipulate LLM-powered assistants. The paper Invitation Is All You Need! evaluates 14 attack scenarios against Gemini-powered assistants and introduces a TARA framework to quantify risk. The authors reported 73% of identified threats as High-Critical and disclosed findings to Google, which deployed mitigations. Attacks include context and memory poisoning, tool misuse, automatic agent/app invocation, and on-device lateral movement affecting smart-home and device control.
read more →

Defending Against Indirect Prompt Injection in LLMs

🔒 Microsoft outlines a layered defense-in-depth strategy to protect systems using LLMs from indirect prompt injection attacks. The approach pairs preventative controls such as hardened system prompts and Spotlighting (delimiting, datamarking, encoding) to isolate untrusted inputs with detection via Microsoft Prompt Shields, surfaced through Azure AI Content Safety and integrated with Defender for Cloud. Impact mitigation uses deterministic controls — fine-grained permissions, Microsoft Purview sensitivity labels, DLP policies, explicit user consent workflows, and blocking known exfiltration techniques — while ongoing research (TaskTracker, LLMail-Inject, FIDES) advances new design patterns and assurances.
read more →