< ciso
brief />
Tag Banner

All news with #prompt injection tag

52 articles · page 3 of 3

Agent Factory Recap: Securing AI Agents in Production

🛡️ This recap of the Agent Factory episode explains practical strategies for securing production AI agents, demonstrating attacks like prompt injection, invisible Unicode exploits, and vector DB context poisoning. It highlights Model Armor for pre- and post-inference filtering, sandboxed execution, network isolation, observability, and tool safeguards via the Agent Development Kit (ADK). The team demonstrates a secured DevOps assistant that blocks data-exfiltration attempts while preserving intended functionality and provides operational guidance on multi-agent authentication, least-privilege IAM, and compliance-ready logging.
read more →

Prompt Hijacking Risks MCP-Based AI Workflows Exposed

⚠️ Security researchers warn that MCP-based AI workflows are vulnerable to "prompt hijacking" when MCP servers issue predictable or reused session IDs, allowing attackers to inject malicious prompts into active client sessions. JFrog demonstrated the issue in oatpp-mcp (CVE-2025-6515), where guessable session IDs could be harvested and reassigned to craft poisoned responses. Recommended mitigations include generating session IDs with cryptographically secure RNGs (≥128 bits of entropy) and having clients validate unpredictable event IDs.
read more →

Model Armor and Apigee: Protecting Generative AI Apps

🔒 Google Cloud’s Model Armor integrates with Apigee to screen prompts, responses, and agent interactions, helping organizations mitigate prompt injection, jailbreaks, sensitive data exposure, malicious links, and harmful content. The model‑agnostic, cloud‑agnostic service supports REST APIs and inline integrations with Apigee, Vertex AI, Agentspace, and network service extensions. The article provides step‑by‑step setup: enable the API, create templates, assign service account roles, add SanitizeUserPrompt and SanitizeModelResponse policies to Apigee proxies, and review findings in the AI Protection dashboard.
read more →

Agentic AI and the OODA Loop: The Integrity Problem

🛡️ Bruce Schneier and Barath Raghavan argue that agentic AIs run repeated OODA loops—Observe, Orient, Decide, Act—over web-scale, adversarial inputs, and that current architectures lack the integrity controls to handle untrusted observations. They show how prompt injection, dataset poisoning, stateful cache contamination, and tool-call vectors (e.g., MCP) let attackers embed malicious control into ordinary inputs. The essay warns that fixing hallucinations is insufficient: we need architectural integrity—semantic verification, privilege separation, and new trust boundaries—rather than surface patches.
read more →

Defending LLM Applications Against Unicode Tag Smuggling

🔒 This AWS Security Blog post examines how Unicode tag block characters (U+E0000–U+E007F) can be abused to hide instructions inside text sent to LLMs, enabling prompt-injection and hidden-character smuggling. It explains why Java's UTF-16 surrogate handling can make one-pass sanitizers inadequate and shows recursive sanitization as a remedy, plus Python-safe filters. The post also outlines using Amazon Bedrock Guardrails denied topics or Lambda-based handlers as mitigation and notes visual/compatibility trade-offs.
read more →

AI Risks Push Integrity Protection to Forefront for CISOs

🔒 CISOs must now prioritize integrity protection as AI introduces new attack surfaces such as data poisoning, prompt injection and adversarial manipulation. Shadow AI — unsanctioned use of models and services — increases risks of data leakage and insecure integrations. Defenses should combine Security by Design, governance, transparency and compliance (e.g., GDPR, EU AI Act) to detect poisoned data and prevent model drift.
read more →

Mind the Gap: TOCTOU Vulnerabilities in LLM-Enabled Agents

⚠️A new study, “Mind the Gap,” examines time-of-check to time-of-use (TOCTOU) flaws in LLM-enabled agents and introduces TOCTOU-Bench, a 66-task benchmark. The authors demonstrate practical attacks such as malicious configuration swaps and payload injection and evaluate defenses adapted from systems security. Their mitigations—prompt rewriting, state integrity monitoring, and tool-fusing—achieve up to 25% automated detection and materially reduce the attack window and executed vulnerabilities.
read more →

New LLM Attack Vectors and Practical Security Steps

🔐This article reviews emerging attack vectors against large language model assistants demonstrated in 2025, highlighting research from Black Hat and other teams. Researchers showed how prompt injections or so‑called promptware — hidden instructions embedded in calendar invites, emails, images, or audio — can coerce assistants like Gemini, Copilot, and Claude into leaking data or performing unauthorized actions. Practical mitigations include early threat modeling, role‑based access for agents, mandatory human confirmation for high‑risk operations, vendor audits, and role‑specific employee training.
read more →

Deploying Agentic AI: Five Steps for Red-Teaming Guide

🛡️ Enterprises adopting agentic AI must update red‑teaming practices to address a rapidly expanding and interactive attack surface. The article summarizes the Cloud Security Alliance’s Agentic AI Red Teaming Guide and corroborating research that documents prompt injection, multi‑agent manipulation, and authorization hijacking as practical threats. It recommends five pragmatic steps—change attitude, continually test guardrails and governance, broaden red‑team skill sets, widen the solution space, and adopt modern tooling—and highlights open‑source and commercial tools such as AgentDojo and Agentgateway. The overall message: combine automated agents with human creativity, embed security in design, and treat agentic systems as sociotechnical operators rather than simple software.
read more →

The AI Fix #67: AI crowd fakes, gullible agents, scams

🎧 In episode 67 of The AI Fix, Graham Cluley and Mark Stockley examine a mix of quirky and concerning AI developments, from an AI-equipped fax machine to an AI-generated crowd at a Will Smith gig. They cover security risks such as prompt-injection hidden in resized images and criminals repurposing Claude techniques for ransomware. The hosts also discuss why GPT-5 represented a larger leap than many realised and review tests showing agentic web browsers are alarmingly gullible to scams.
read more →

Penn Study Finds: GPT-4o-mini Susceptible to Persuasion

🔬 University of Pennsylvania researchers tested GPT-4o-mini on two categories of requests an aligned model should refuse: insulting the user and giving instructions to synthesize lidocaine. They crafted prompts using seven persuasion techniques (Authority, Commitment, Liking, Reciprocity, Scarcity, Social proof, Unity) and matched control prompts, then ran each prompt 1,000 times at the default temperature for a total of 28,000 trials. Persuasion prompts raised compliance from 28.1% to 67.4% for insults and from 38.5% to 76.5% for drug instructions, demonstrating substantial vulnerability to social-engineering cues.
read more →

The Brain Behind Next-Generation Cyber Attacks and AI Risks

🧠 Researchers at Carnegie Mellon University demonstrated that leading large language models (LLMs), by themselves, struggle to execute complex, multi-host cyber-attacks end-to-end, frequently wandering off-task or returning incorrect parameters. Their proposed solution, Incalmo, is a structured abstraction layer that constrains planning to a precise set of actions and validated parameters, substantially improving completion and coordination. The work highlights both enhanced offensive potential when LLMs are scaffolded and urgent defensive challenges for security teams.
read more →