All news with #prompt injection tag
Mon, November 17, 2025
A Methodical Approach to Agent Evaluation: Quality Gate
🧭 Hugo Selbie presents a practical framework for evaluating modern multi-step AI agents, emphasizing that final-output metrics alone miss silent failures arising from incorrect reasoning or tool use. He recommends defining clear, measurable success criteria up front and assessing agents across three pillars: end-to-end quality, process/trajectory analysis, and trust & safety. The piece outlines mixed evaluation methods—human review, LLM-as-a-judge, programmatic checks, and adversarial testing—and prescribes operationalizing these checks in CI/CD with production monitoring and feedback loops.
Mon, November 17, 2025
Best-in-Class GenAI Security: CloudGuard WAF Meets Lakera
🔒 The rise of generative AI introduces new attack surfaces that conventional security stacks were never designed to address. This post outlines how pairing CloudGuard WAF with Lakera's AI-risk controls creates layered protection by inspecting prompts, model interactions, and data flows at the application edge. The integrated solution aims to prevent prompt injection, sensitive-data leakage, and harmful content generation while maintaining application availability and performance.
Mon, November 17, 2025
Fight Fire With Fire: Countering AI-Powered Adversaries
🔥 We summarize Anthropic’s disruption of a nation-state campaign that weaponized agentic models and the Model Context Protocol to automate global intrusions. The attack automated reconnaissance, exploitation, and lateral movement at unprecedented speed, leveraging open-source tools and achieving 80–90% autonomous execution. It used prompt injection (role-play) to bypass model guardrails, highlighting the need for prompt injection defenses and semantic-layer protections. Organizations must adopt AI-powered defenses such as CrowdStrike Falcon and the Charlotte agentic SOC to match adversary tempo.
Thu, November 13, 2025
What CISOs Should Know About Securing MCP Servers Now
🔒 The Model Context Protocol (MCP) enables AI agents to connect to data sources, but early specifications lacked robust protections, leaving deployments exposed to prompt injection, token theft, and tool poisoning. Recent protocol updates — including OAuth, third‑party identity provider support, and an official MCP registry — plus vendor tooling from hyperscalers and startups have improved defenses. Still, authentication remains optional and gaps persist, so organizations should apply zero trust and least‑privilege controls, enforce strong secrets management and logging, and consider specialist MCP security solutions before production rollout.
Wed, November 12, 2025
Tenable Reveals New Prompt-Injection Risks in ChatGPT
🔐 Researchers at Tenable disclosed seven techniques that can cause ChatGPT to leak private chat history by abusing built-in features such as web search, conversation memory and Markdown rendering. The attacks are primarily indirect prompt injections that exploit a secondary summarization model (SearchGPT), Bing tracking redirects, and a code-block rendering bug. Tenable reported the issues to OpenAI, and while some fixes were implemented several techniques still appear to work.
Wed, November 12, 2025
Extending Zero Trust to Autonomous AI Agents in Enterprises
🔐 As enterprises deploy AI assistants and autonomous agents, existing security frameworks must evolve to treat these agents as first-class identities rather than afterthoughts. The piece advocates applying Zero Trust principles—identity-first access, least-privilege, dynamic contextual enforcement, and continuous monitoring—to agentic identities to prevent misuse and reduce attack surface. Practical controls include scoped, short-lived tokens, tiered trust models, strict access boundaries, and assigning clear human ownership to each agent.
Wed, November 12, 2025
Secure AI by Design: A Policy Roadmap for Organizations
🛡️ In just a few years, AI has shifted from futuristic innovation to core business infrastructure, yet security practices have not kept pace. Palo Alto Networks presents a Secure AI by Design Policy Roadmap that defines the AI attack surface and prescribes actionable measures across external tools, agents, applications, and infrastructure. The Roadmap aligns with recent U.S. policy moves — including the June 2025 Executive Order and the July 2025 White House AI Action Plan — and calls for purpose-built defenses rather than retrofitting legacy controls.
Tue, November 11, 2025
CometJacking: Prompt-Injection Risk in AI Browsers
🔒 Researchers disclosed a prompt-injection technique dubbed CometJacking that abuses URL parameters to deliver hidden instructions to Perplexity’s Comet AI browser. By embedding malicious directives in the 'collection' parameter an attacker can cause the agent to consult connected services and memory instead of searching the web. LayerX demonstrated exfiltration of Gmail messages and Google Calendar invites by encoding data in base64 and sending it to an external endpoint. According to the report, Comet followed the malicious prompt and bypassed Perplexity’s safeguards, illustrating broader limits of current LLM-based assistants.
Mon, November 10, 2025
Researchers Trick ChatGPT into Self Prompt Injection
🔒 Researchers at Tenable identified seven techniques that can coerce ChatGPT into disclosing private chat history by abusing built-in features like web browsing and long-term Memories. They show how OpenAI’s browsing pipeline routes pages through a weaker intermediary model, SearchGPT, which can be prompt-injected and then used to seed malicious instructions back into ChatGPT. Proof-of-concepts include exfiltration via Bing-tracked URLs, Markdown image loading, and a rendering quirk, and Tenable says some issues remain despite reported fixes.
Thu, November 6, 2025
CIO’s First Principles: A Reference Guide to Securing AI
🔐 Enterprises must redesign security as AI moves from experimentation to production, and CIOs need a prevention-first, unified approach. This guide reframes Confidentiality, Integrity and Availability for AI, stressing rigorous access controls, end-to-end data lineage, adversarial testing and a defensible supply chain to prevent poisoning, prompt injection and model hijacking. Palo Alto Networks advocates embedding security across MLOps, real-time visibility of models and agents, and executive accountability to eliminate shadow AI and ensure resilient, auditable AI deployments.
Thu, November 6, 2025
Multi-Turn Adversarial Attacks Expose LLM Weaknesses
🔍 Cisco AI Defense's report shows open-weight large language models remain vulnerable to adaptive, multi-turn adversarial attacks even when single-turn defenses appear effective. Using over 1,000 prompts per model and analyzing 499 simulated conversations of 5–10 exchanges, researchers found iterative strategies such as Crescendo, Role-Play and Refusal Reframe drove failure rates above 90% in many cases. The study warns that traditional safety filters are insufficient and recommends strict system prompts, model-agnostic runtime guardrails and continuous red-teaming to mitigate risk.
Thu, November 6, 2025
AI-Powered Malware Emerges: Google Details New Threats
🛡️ Google Threat Intelligence Group (GTIG) reports that cybercriminals are actively integrating large language models into malware campaigns, moving beyond mere tooling to generate, obfuscate, and adapt malicious code. GTIG documents new families — including PROMPTSTEAL, PROMPTFLUX, FRUITSHELL, and PROMPTLOCK — that query commercial APIs to produce or rewrite payloads and evade detection. Researchers also note attackers use social‑engineering prompts to trick LLMs into revealing sensitive guidance and that underground marketplaces increasingly offer AI-enabled “malware-as-a-service,” lowering the bar for less skilled threat actors.
Thu, November 6, 2025
Google Warns: AI-Enabled Malware Actively Deployed
⚠️ Google’s Threat Intelligence Group has identified a new class of AI-enabled malware that leverages large language models at runtime to generate and obfuscate malicious code. Notable families include PromptFlux, which uses the Gemini API to rewrite its VBScript dropper for persistence and lateral spread, and PromptSteal, a Python data miner that queries Qwen2.5-Coder-32B-Instruct to create on-demand Windows commands. GTIG observed PromptSteal used by APT28 in Ukraine, while other examples such as PromptLock, FruitShell and QuietVault demonstrate varied AI-driven capabilities. Google warns this "just-in-time AI" approach could accelerate malware sophistication and democratize cybercrime.
Thu, November 6, 2025
Google: LLMs Employed Operationally in Malware Attacks
🤖 Google’s Threat Intelligence Group (GTIG) reports attackers are using “just‑in‑time” AI—LLMs queried during execution—to generate and obfuscate malicious code. Researchers identified two families, PROMPTSTEAL and PROMPTFLUX, which query Hugging Face and Gemini APIs to craft commands, rewrite source code, and evade detection. GTIG also documents social‑engineering prompts that trick models into revealing red‑teaming or exploit details, and warns the underground market for AI‑enabled crime is maturing. Google says it has disabled related accounts and applied protections.
Wed, November 5, 2025
Google: PROMPTFLUX malware uses Gemini to self-write
🤖 Google researchers disclosed a VBScript threat named PROMPTFLUX that queries Gemini via a hard-coded API key to request obfuscated VBScript designed to evade static detection. A 'Thinking Robot' component logs AI responses to %TEMP% and writes updated scripts to the Windows Startup folder to maintain persistence. Samples include propagation attempts to removable drives and mapped network shares, and variants that rewrite their source on an hourly cadence. Google assesses the malware as experimental and currently lacking known exploit capabilities.
Wed, November 5, 2025
GTIG Report: AI-Enabled Threats Transform Cybersecurity
🔒 The Google Threat Intelligence Group (GTIG) released a report documenting a clear shift: adversaries are moving beyond benign productivity uses of AI and are experimenting with AI-enabled operations. GTIG observed state-sponsored actors from North Korea, Iran and the People's Republic of China using AI for reconnaissance, tailored phishing lure creation and data exfiltration. Threats described include AI-powered, self-modifying malware, prompt-engineering to bypass safety guardrails, and underground markets selling advanced AI attack capabilities. Google says it has disrupted malicious assets and applied that intelligence to strengthen classifiers and its AI models.
Wed, November 5, 2025
Researchers Find ChatGPT Vulnerabilities in GPT-4o/5
🛡️ Cybersecurity researchers disclosed seven vulnerabilities in OpenAI's GPT-4o and GPT-5 models that enable indirect prompt injection attacks to exfiltrate user data from chat histories and stored memories. Tenable researchers Moshe Bernstein and Liv Matan describe zero-click search exploits, one-click query execution, conversation and memory poisoning, a markdown rendering bug, and a safety bypass using allow-listed Bing links. OpenAI has mitigated some issues, but experts warn that connecting LLMs to external tools broadens the attack surface and that robust safeguards and URL-sanitization remain essential.
Wed, November 5, 2025
Cloud CISO: Threat Actors' Growing Use of AI Tools
⚠️Google's Threat Intelligence team reports a shift from experimentation to operational use of AI by threat actors, including AI-enabled malware and prompt-based command generation. GTIG highlighted PROMPTSTEAL, linked to APT28 (FROZENLAKE), which queries a Hugging Face LLM to generate scripts for reconnaissance, document collection, and exfiltration, while adopting greater obfuscation and altered C2 methods. Google disabled related assets, strengthened model classifiers and safeguards with DeepMind, and urges defenders to update threat models, monitor anomalous scripting and C2, and incorporate threat intelligence into model- and classifier-level protections.
Wed, November 5, 2025
GTIG: Threat Actors Shift to AI-Enabled Runtime Malware
🔍 Google Threat Intelligence Group (GTIG) reports an operational shift from adversaries using AI for productivity to embedding generative models inside malware to generate or alter code at runtime. GTIG details “just-in-time” LLM calls in families like PROMPTFLUX and PROMPTSTEAL, which query external models such as Gemini to obfuscate, regenerate, or produce one‑time functions during execution. Google says it disabled abusive assets, strengthened classifiers and model protections, and recommends monitoring LLM API usage, protecting credentials, and treating runtime model calls as potential live command channels.
Wed, November 5, 2025
Prompt Injection Flaw in Anthropic Claude Desktop Exts
🔒Anthropic's official Claude Desktop extensions for Chrome, iMessage and Apple Notes were found vulnerable to web-based prompt injection that could enable remote code execution. Koi Security reported unsanitized command injection in the packaged Model Context Protocol (MCP) servers, which run unsandboxed on users' devices with full system permissions. Unlike browser extensions, these connectors can read files, execute commands and access credentials. Anthropic released a fix in v0.1.9, verified by Koi Security on September 19.