< ciso
brief />
Tag Banner

All news with #prompt injection attack tag

106 articles · page 6 of 6

LLMs Remain Vulnerable to Malicious Prompt Injection Attacks

🛡️ A recent proof-of-concept by Bargury demonstrates a practical and stealthy prompt injection that leverages a poisoned document stored in a victim's Google Drive. The attacker hides a 300-word instruction in near-invisible white, size-one text that tells an LLM to search Drive for API keys and exfiltrate them via a crafted Markdown URL. Schneier warns this technique shows how agentic AI systems exposed to untrusted inputs remain fundamentally insecure, and that current defenses are inadequate against such adversarial inputs.
read more →

Securing and Governing Autonomous AI Agents in Business

🔐 Microsoft outlines practical guidance for securing and governing the emerging class of autonomous agents. Igor Sakhnov explains how agents—now moving from experimentation into deployment—introduce risks such as task drift, Cross Prompt Injection Attacks (XPIA), hallucinations, and data exfiltration. Microsoft recommends starting with a unified agent inventory and layered controls across identity, access, data, posture, threat, network, and compliance. It introduces Entra Agent ID and an agent registry concept to enable auditable, just-in-time identities and improved observability.
read more →

Block Unsafe LLM Prompts with Firewall for AI at the Edge

🛡️ Cloudflare has integrated unsafe content moderation into Firewall for AI, using Llama Guard 3 to detect and block harmful prompts in real time at the network edge. The model-agnostic filter identifies categories including hate, violence, sexual content, criminal planning, and self-harm, and lets teams block or log flagged prompts without changing application code. Detection runs on Workers AI across Cloudflare's GPU fleet with a 2-second analysis cutoff, and logs record categories but not raw prompt text. The feature is available in beta to existing customers.
read more →

Logit-Gap Steering Reveals Limits of LLM Alignment

⚠️ Unit 42 researchers Tony Li and Hongliang Liu introduce Logit-Gap Steering, a new framework that exposes how alignment training produces a measurable refusal-affirmation logit gap rather than eliminating harmful outputs. Their paper demonstrates efficient short-path suffix jailbreaks that achieved high success rates on open-source models including Qwen, LLaMA, Gemma and the recently released gpt-oss-20b. The findings argue that internal alignment alone is insufficient and recommend a defense-in-depth approach with external safeguards and content filters.
read more →

Portkey Integrates Prisma AIRS to Secure AI Gateways

🔐 Palo Alto Networks and Portkey have integrated Prisma AIRS directly into Portkey’s AI gateway to embed security guardrails at the gateway level. The collaboration aims to protect applications from AI-specific threats—such as prompt injections, PII and secret leakage, and malicious outputs—while preserving Portkey’s operational benefits like observability and cost controls. A one-time configuration via Portkey’s Guardrails module enforces protections without code changes, and teams can monitor posture through Portkey logs and the Prisma AIRS dashboard.
read more →

Defending Against Indirect Prompt Injection in LLMs

🔒 Microsoft outlines a layered defense-in-depth strategy to protect systems using LLMs from indirect prompt injection attacks. The approach pairs preventative controls such as hardened system prompts and Spotlighting (delimiting, datamarking, encoding) to isolate untrusted inputs with detection via Microsoft Prompt Shields, surfaced through Azure AI Content Safety and integrated with Defender for Cloud. Impact mitigation uses deterministic controls — fine-grained permissions, Microsoft Purview sensitivity labels, DLP policies, explicit user consent workflows, and blocking known exfiltration techniques — while ongoing research (TaskTracker, LLMail-Inject, FIDES) advances new design patterns and assurances.
read more →