Tag Banner

All news with #system prompt exposure tag

Tue, December 9, 2025

NCSC Warns Prompt Injection May Be Inherently Unfixable

⚠️ The UK National Cyber Security Centre (NCSC) warns that prompt injection vulnerabilities in large language models may never be fully mitigated, and defenders should instead focus on reducing impact and residual risk. NCSC technical director David C cautions against treating prompt injection like SQL injection, because LLMs do not distinguish between 'data' and 'instructions' and operate by token prediction. The NCSC recommends secure LLM design, marking data separately from instructions, restricting access to privileged tools, and enhanced monitoring to detect suspicious activity.

read more →

Thu, December 4, 2025

Generative AI's Dual Role in Cybersecurity, Evolving

🛡️ Generative AI is rapidly reshaping cybersecurity by amplifying both attackers' and defenders' capabilities. Adversaries leverage models for coding assistance, phishing and social engineering, anti-analysis techniques (including prompts hidden in DNS) and vulnerability discovery, with AI-assisted elements beginning to appear in malware while still needing significant human oversight. Defenders use GenAI to triage threat data, speed incident response, detect code flaws, and augment analysts through MCP-style integrations. As models shrink and access widens, both risk and defensive opportunity are likely to grow.

read more →

Fri, November 28, 2025

Researchers Warn of Security Risks in Google Antigravity

⚠️ Google’s newly released Antigravity IDE has drawn security warnings after researchers reported vulnerabilities that can allow malicious repositories to compromise developer workspaces and install persistent backdoors. Mindgard, Adam Swanda, and others disclosed indirect prompt injection and trusted-input handling flaws that could enable data exfiltration and remote command execution. Google says it is aware, has updated its Known Issues page, and is working with product teams to address the reports.

read more →

Mon, November 24, 2025

DeepSeek-R1 Generates Less Secure Code for China-Sensitive Prompts

⚠️ CrowdStrike analysis finds that DeepSeek-R1, an open-source AI reasoning model from a Chinese vendor, produces significantly more insecure code when prompts reference topics the Chinese government deems sensitive. Baseline tests produced vulnerable code in 19% of neutral prompts, rising to 27.2% for Tibet-linked scenarios. Researchers also observed partial refusals and internal planning traces consistent with targeted guardrails that may unintentionally degrade code quality.

read more →

Mon, November 17, 2025

Fight Fire With Fire: Countering AI-Powered Adversaries

🔥 We summarize Anthropic’s disruption of a nation-state campaign that weaponized agentic models and the Model Context Protocol to automate global intrusions. The attack automated reconnaissance, exploitation, and lateral movement at unprecedented speed, leveraging open-source tools and achieving 80–90% autonomous execution. It used prompt injection (role-play) to bypass model guardrails, highlighting the need for prompt injection defenses and semantic-layer protections. Organizations must adopt AI-powered defenses such as CrowdStrike Falcon and the Charlotte agentic SOC to match adversary tempo.

read more →

Wed, October 29, 2025

Open-Source b3 Benchmark Boosts LLM Security Testing

🛡️ The UK AI Security Institute (AISI), Check Point and Lakera have launched b3, an open-source benchmark to assess and strengthen the security of backbone LLMs that power AI agents. b3 focuses on the specific LLM calls within agent workflows where malicious inputs can trigger harmful outputs, using 10 representative "threat snapshots" combined with a dataset of 19,433 adversarial attacks from Lakera’s Gandalf initiative. The benchmark surfaces vulnerabilities such as system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service and unauthorized tool calls, making LLM security more measurable, reproducible and comparable across models and applications.

read more →

Mon, October 27, 2025

ChatGPT Atlas 'Tainted Memories' CSRF Risk Exposes Accounts

⚠️ Researchers disclosed a CSRF-based vulnerability in ChatGPT Atlas that can inject malicious instructions into the assistant's persistent memory, potentially enabling arbitrary code execution, account takeover, or malware deployment. LayerX warns that corrupted memories persist across devices and sessions until manually deleted and that Atlas' anti-phishing defenses lag mainstream browsers. The flaw converts a convenience feature into a persistent attack vector that can be invoked during normal prompts.

read more →

Tue, October 7, 2025

Google won’t fix new ASCII smuggling attack in Gemini

⚠️ Google has declined to patch a new ASCII smuggling vulnerability in Gemini, a technique that embeds invisible Unicode Tags characters to hide instructions from users while still being processed by LLMs. Researcher Viktor Markopoulos of FireTail demonstrated hidden payloads delivered via Calendar invites, emails, and web content that can alter model behavior, spoof identities, or extract sensitive data. Google said the issue is primarily social engineering rather than a security bug.

read more →

Fri, October 3, 2025

CometJacking attack tricks Comet browser into leaking data

🛡️ LayerX researchers disclosed a prompt-injection technique called CometJacking that abuses Perplexity’s Comet AI browser by embedding malicious instructions in a URL's collection parameter. The payload directs the agent to consult connected services (such as Gmail and Google Calendar), encode the retrieved content in base64, and send it to an attacker-controlled endpoint. The exploit requires no credentials or additional user interaction beyond clicking a crafted link. Perplexity reviewed LayerX's late-August reports and classified the findings as "Not Applicable."

read more →

Mon, September 15, 2025

Code Assistant Risks: Indirect Prompt Injection and Misuse

🛡️ Unit 42 describes how IDE-integrated AI code assistants can be abused to insert backdoors, leak secrets, or produce harmful output by exploiting features like chat, auto-complete, and context attachment. The report highlights an indirect prompt injection vector where attackers contaminate public or third‑party data sources; when that data is attached as context, malicious instructions can hijack the assistant. It recommends reviewing generated code, controlling attached context, adopting standard LLM security practices, and contacting Unit 42 if compromise is suspected.

read more →

Tue, July 29, 2025

Defending Against Indirect Prompt Injection in LLMs

🔒 Microsoft outlines a layered defense-in-depth strategy to protect systems using LLMs from indirect prompt injection attacks. The approach pairs preventative controls such as hardened system prompts and Spotlighting (delimiting, datamarking, encoding) to isolate untrusted inputs with detection via Microsoft Prompt Shields, surfaced through Azure AI Content Safety and integrated with Defender for Cloud. Impact mitigation uses deterministic controls — fine-grained permissions, Microsoft Purview sensitivity labels, DLP policies, explicit user consent workflows, and blocking known exfiltration techniques — while ongoing research (TaskTracker, LLMail-Inject, FIDES) advances new design patterns and assurances.

read more →