All news with #indirect prompt injection tag

38 articles · page 2 of 2

December 8, 2025

Architecting Security for Agentic Browsing in Chrome

🛡️ Chrome describes a layered approach to secure agentic browsing with Gemini, focusing on defenses against indirect prompt injection and goal‑hijacking. A new User Alignment Critic — an isolated, high‑trust model — reviews planned agent actions using only metadata and can veto misaligned steps. Chrome also enforces Agent Origin Sets to limit readable and writable origins, adds deterministic confirmations for sensitive actions, runs prompt‑injection detection in real time, and sustains continuous red‑teaming and monitoring to reduce exfiltration and unwanted transactions.

Google Gemini AI Security Agentic AI

December 4, 2025

Indirect Prompt Injection: Hidden Risks to AI Systems

🔐 The article explains how indirect prompt injection — malicious instructions embedded in external content such as documents, images, emails and webpages — can manipulate AI tools without users seeing the exploit. It contrasts indirect attacks with direct prompt injection and cites CrowdStrike's analysis of over 300,000 adversarial prompts and 150 techniques. Recommended defenses include detection, input sanitization, allowlisting, privilege separation, monitoring and user education to shrink this expanding attack surface.

Indirect Prompt Injection LLM Security Prompt Injection Attack Research

November 27, 2025

Hidden URL-fragment prompts can hijack AI browsers

⚠️ Researchers demonstrated a client-side prompt injection called HashJack that hides malicious instructions in URL fragments after the '#' symbol. AI-powered browsers and assistants — including Comet, Copilot for Edge, and Gemini for Chrome — read these fragments for context, allowing attackers to weaponize legitimate sites for phishing, data exfiltration, credential theft, or malware distribution. Because fragment data never reaches servers, network defenses and server logs may not detect this technique.

Prompt Injection Attack AI Security Indirect Prompt Injection

November 26, 2025

HashJack: Indirect Prompt Injection Targets AI Browsers

⚠️Security researchers at Cato Networks disclosed HashJack, a novel indirect prompt-injection vulnerability that abuses URL fragments (the text after '#') to deliver hidden instructions to AI browsers. Because fragments never leave the client, servers and network defenses cannot see them, allowing attackers to weaponize legitimate websites without altering visible content. Affected agents included Comet, Copilot for Edge and Gemini for Chrome, with some vendors already rolling fixes.

Prompt Injection Attack Indirect Prompt Injection AI Security

November 10, 2025

Researchers Trick ChatGPT into Self Prompt Injection

🔒 Researchers at Tenable identified seven techniques that can coerce ChatGPT into disclosing private chat history by abusing built-in features like web browsing and long-term Memories. They show how OpenAI’s browsing pipeline routes pages through a weaker intermediary model, SearchGPT, which can be prompt-injected and then used to seed malicious instructions back into ChatGPT. Proof-of-concepts include exfiltration via Bing-tracked URLs, Markdown image loading, and a rendering quirk, and Tenable says some issues remain despite reported fixes.

ChatGPT Tenable Prompt Injection Indirect Prompt Injection

November 5, 2025

Researchers Find ChatGPT Vulnerabilities in GPT-4o/5

🛡️ Cybersecurity researchers disclosed seven vulnerabilities in OpenAI's GPT-4o and GPT-5 models that enable indirect prompt injection attacks to exfiltrate user data from chat histories and stored memories. Tenable researchers Moshe Bernstein and Liv Matan describe zero-click search exploits, one-click query execution, conversation and memory poisoning, a markdown rendering bug, and a safety bypass using allow-listed Bing links. OpenAI has mitigated some issues, but experts warn that connecting LLMs to external tools broadens the attack surface and that robust safeguards and URL-sanitization remain essential.

OpenAI Indirect Prompt Injection AI Security Data Exfiltration

November 3, 2025

Anthropic Claude vulnerability exposes enterprise data

🔒 Security researcher Johann Rehberger demonstrated an indirect prompt‑injection technique that abuses Claude's Code Interpreter to exfiltrate corporate data. He showed that Claude can write sensitive chat histories and uploaded documents to the sandbox and then upload them via the Files API using an attacker's API key. The root cause is the default network egress setting Package managers only, which still allows access to api.anthropic.com. Available mitigations — disabling network access or strict whitelisting — significantly reduce functionality.

Anthropic Indirect Prompt Injection AI Security

October 28, 2025

Copilot Mermaid Diagrams Could Exfiltrate Enterprise Emails

🔐 Microsoft has patched an indirect prompt injection vulnerability in Microsoft 365 Copilot that could have been exploited to exfiltrate recent enterprise emails via clickable Mermaid diagrams. Researcher Adam Logue demonstrated a multi-stage attack using Office documents containing hidden white-text instructions that caused Copilot to invoke an internal search-enterprise_emails tool. The assistant encoded retrieved emails into hex, embedded them in Mermaid output styled as a login button, and added an attacker-controlled hyperlink. Microsoft mitigated the risk by disabling interactive hyperlinks in Mermaid diagrams within Copilot chats.

Microsoft Microsoft Copilot Indirect Prompt Injection AI Security

October 9, 2025

Indirect Prompt Injection Poisons Agents' Long-Term Memory

⚠️This Unit 42 proof-of-concept shows how an attacker can use indirect prompt injection to silently poison an AI agent’s long-term memory, demonstrated against a travel assistant built on Amazon Bedrock. The attack manipulates the agent’s session summarization process so malicious instructions become stored memory and persist across sessions. When the compromised memory is later injected into orchestration prompts, the agent can be coerced into unauthorized actions such as stealthy exfiltration. Unit 42 outlines layered mitigations including pre-processing prompts, Bedrock Guardrails, content filtering, URL allowlisting, and logging to reduce risk.

Amazon Bedrock Indirect Prompt Injection AI Security Bedrock Guardrails

October 7, 2025

Google won’t fix new ASCII smuggling attack in Gemini

⚠️ Google has declined to patch a new ASCII smuggling vulnerability in Gemini, a technique that embeds invisible Unicode Tags characters to hide instructions from users while still being processed by LLMs. Researcher Viktor Markopoulos of FireTail demonstrated hidden payloads delivered via Calendar invites, emails, and web content that can alter model behavior, spoof identities, or extract sensitive data. Google said the issue is primarily social engineering rather than a security bug.

Gemini Prompt Injection Attack Indirect Prompt Injection

September 30, 2025

Researchers Disclose Trio of Gemini AI Vulnerabilities

🔒 Cybersecurity researchers disclosed three now-patched vulnerabilities in Google's Gemini suite that could have exposed user data and enabled search- and prompt-injection attacks. The flaws, labeled the Gemini Trifecta, impacted Gemini Cloud Assist, the Search Personalization model, and the Browsing Tool. Following responsible disclosure, Google stopped rendering hyperlinks in log summaries and implemented additional hardening. Tenable warned these issues could have allowed covert exfiltration of saved user information and location data.

Google Gemini Indirect Prompt Injection Data Exfiltration

September 30, 2025

Gemini Trifecta Exposes Indirect AI Attack Surfaces

⚠️Tenable has revealed three vulnerabilities in Google's Gemini platform, collectively dubbed the "Gemini Trifecta," that enable indirect prompt injection and data exfiltration through integrations. The issues allow attackers to poison GCP logs consumed by Gemini Cloud Assist, inject malicious entries into Chrome search history to manipulate the Search Personalization Model, and coerce the Browsing Tool into fetching attacker-controlled URLs that leak sensitive query data. Google has patched the flaws, and Tenable urges security teams to treat AI integrations as active threat surfaces and implement input sanitization, output validation, monitoring, and regular penetration testing.

Google Gemini Indirect Prompt Injection AI Security

September 20, 2025

ShadowLeak: Zero-click flaw exposes Gmail via ChatGPT

🔓 Radware disclosed ShadowLeak, a zero-click vulnerability in OpenAI's ChatGPT Deep Research agent that can exfiltrate sensitive Gmail inbox data when a single crafted email is present. The technique hides indirect prompt injections in email HTML using tiny fonts, white-on-white text and CSS/layout tricks so a human user is unlikely to notice the commands while the agent reads and follows them. In Radware's proof-of-concept the agent, once granted Gmail integration, parses the hidden instructions and uses browser tools to send extracted data to an external server. OpenAI addressed the issue in early August after a responsible disclosure on June 18, and Radware warned the approach could extend to many other connectors, expanding the attack surface.

OpenAI ChatGPT Indirect Prompt Injection AI Security

September 18, 2025

ShadowLeak: AI agents can exfiltrate data undetected

⚠️Researchers at Radware disclosed a vulnerability called ShadowLeak in the Deep Research module of ChatGPT that lets hidden, attacker-crafted instructions embedded in emails coerce an AI agent to exfiltrate sensitive data. The indirect prompt-injection technique hides commands using tiny fonts, white-on-white text or metadata and instructs the agent to encode and transmit results (for example, Base64-encoded lists of names and credit cards) to an attacker-controlled URL. Radware says the key risk is that exfiltration can occur from the model’s cloud backend, making detection by the affected organization very difficult; OpenAI was notified and implemented a fix, and Radware found the patch effective in subsequent tests.

OpenAI ChatGPT Indirect Prompt Injection AI Data Leakage

September 17, 2025

New LLM Attack Vectors and Practical Security Steps

🔐This article reviews emerging attack vectors against large language model assistants demonstrated in 2025, highlighting research from Black Hat and other teams. Researchers showed how prompt injections or so‑called promptware — hidden instructions embedded in calendar invites, emails, images, or audio — can coerce assistants like Gemini, Copilot, and Claude into leaking data or performing unauthorized actions. Practical mitigations include early threat modeling, role‑based access for agents, mandatory human confirmation for high‑risk operations, vendor audits, and role‑specific employee training.

LLM Security Prompt Injection AI Alignment Indirect Prompt Injection

September 15, 2025

Code Assistant Risks: Indirect Prompt Injection and Misuse

🛡️ Unit 42 describes how IDE-integrated AI code assistants can be abused to insert backdoors, leak secrets, or produce harmful output by exploiting features like chat, auto-complete, and context attachment. The report highlights an indirect prompt injection vector where attackers contaminate public or third‑party data sources; when that data is attached as context, malicious instructions can hijack the assistant. It recommends reviewing generated code, controlling attached context, adopting standard LLM security practices, and contacting Unit 42 if compromise is suspected.

Unit 42 Indirect Prompt Injection AI Agent Hijacking LLM Security

September 3, 2025

Indirect Prompt-Injection Threats to LLM Assistants

🔐 New research demonstrates practical, dangerous promptware attacks that exploit common interactions—calendar invites, emails, and shared documents—to manipulate LLM-powered assistants. The paper Invitation Is All You Need! evaluates 14 attack scenarios against Gemini-powered assistants and introduces a TARA framework to quantify risk. The authors reported 73% of identified threats as High-Critical and disclosed findings to Google, which deployed mitigations. Attacks include context and memory poisoning, tool misuse, automatic agent/app invocation, and on-device lateral movement affecting smart-home and device control.

Gemini Google Indirect Prompt Injection AI Agent Hijacking

July 29, 2025

Defending Against Indirect Prompt Injection in LLMs

🔒 Microsoft outlines a layered defense-in-depth strategy to protect systems using LLMs from indirect prompt injection attacks. The approach pairs preventative controls such as hardened system prompts and Spotlighting (delimiting, datamarking, encoding) to isolate untrusted inputs with detection via Microsoft Prompt Shields, surfaced through Azure AI Content Safety and integrated with Defender for Cloud. Impact mitigation uses deterministic controls — fine-grained permissions, Microsoft Purview sensitivity labels, DLP policies, explicit user consent workflows, and blocking known exfiltration techniques — while ongoing research (TaskTracker, LLMail-Inject, FIDES) advances new design patterns and assurances.

LLM Security Prompt Injection Attack Indirect Prompt Injection AI Guardrails