< ciso
brief />
Tag Banner

All news with #prompt injection attack tag

106 articles

Prompt-Injection Flaws in Copilot Studio and Agentforce

⚠️ Security researchers at Capsule Security disclosed prompt-injection vulnerabilities in Microsoft Copilot Studio and Salesforce Agentforce that let attackers embed malicious instructions in public form fields. Crafted inputs submitted via SharePoint or lead forms can override agent instructions and trigger data exfiltration to attacker-controlled endpoints. Microsoft patched the SharePoint-related issue (CVE-2026-21520) with a 7.5 CVSS score; Salesforce acknowledged the problem but described the vector as configuration-specific. Researchers warn that treating external inputs as trusted undermines autonomous agent security and urge input validation, least-privilege, and stricter outbound controls.
read more →

CISOs Confront Widening AI Visibility and Risk Gaps

🔍 CISOs are scrambling to close visibility gaps as organizations rapidly adopt AI, confronting risks such as prompt injection, data poisoning, shadow AI, and agentic behaviors. Security leaders report limited insight into where AI is used and how models behave, forcing them to reposition existing tools, adopt new monitoring solutions, and formalize governance. While traditional controls like DLP and SIEM can mitigate many issues, experts warn no single solution is fully mature, so leaders must balance guardrails, emerging observability tools, and business velocity.
read more →

Securing AI Inference on GKE with Model Armor Gateways

🔒 Enterprises are moving AI workloads to GKE at scale, but serving models introduces risks such as prompt injection and sensitive data leakage that traditional network controls miss. Google recommends Model Armor, a gateway-integrated guardrail service that inspects requests before they reach the model and scans outputs afterward. It offers proactive input scrutiny, content-aware output moderation, and DLP integration, all without code changes to your application. Integrated logging surfaces policy triggers to Security Command Center for audit and response.
read more →

Critical Flowise flaw enables JavaScript injection in AI

🚨 A critical design oversight in Flowise, a low-code platform for building LLM flows, allows arbitrary JavaScript to be injected via its Custom MCP node. The vulnerability (CVE-2025-59528) results from unsafe parsing in convertToValidJSONString, which feeds user input to the Function() constructor and executes with full Node.js privileges. A patch shipped in v3.0.6 and the latest public release is v3.1.1, but thousands of internet-exposed instances remain at risk as attackers have begun exploiting unpatched deployments.
read more →

Applying Security Fundamentals to AI: Practical Advice

🛡️ Treat AI like a very new, junior employee and as software: it’s capable but not infallible, so give clear goals, explicit permissions, and limit its authority. Apply distinct identities and least-privilege controls, avoid relying on AI for deterministic access decisions, and test for indirect prompt injection (XPIA) using techniques such as Spotlighting and Prompt Shield. Design end-to-end systems that include people and processes, document safety plans and failure modes, and continuously monitor and vet models and agents for changes.
read more →

ChatGPT vulnerability enabled covert data exfiltration

⚠️A security flaw in ChatGPT could be triggered by a single malicious prompt to create a covert exfiltration channel, researchers at Check Point reported. The issue allowed data to be leaked via a DNS side channel from the model’s isolated runtime and was patched by OpenAI on 20 February after disclosure. Check Point demonstrated extraction of uploaded files and private prompts and warned that users copying prompts from public sources could be exposed.
read more →

Securing Agentic AI: End-to-End Enterprise Protections

🔒 Microsoft presents an end-to-end strategy to secure agentic AI with the new Agent 365 control plane and updates across Microsoft Defender, Entra, Purview, and Sentinel. Announced for RSAC 2026, these measures focus on visibility, continuous identity protection, data loss prevention for Copilot prompts, and prompt-injection defenses to help organizations observe, govern, and defend agent ecosystems at scale.
read more →

Securing Homegrown AI Agents with Falcon AIDR & NeMo

🔒 Falcon AIDR now integrates with NVIDIA NeMo Guardrails to provide programmable runtime protections for homegrown AI agents moving into production. The combined solution blocks prompt injection, redacts PII, defangs malicious domains, and moderates unwanted topics while preserving responsive, sub-100ms agent workflows. Teams can leverage 75+ built-in detectors or create custom policies to monitor in report-only mode and then progressively enforce blocks, redactions, encryptions, or transformations.
read more →

Custom AI Apps to Dominate Incident Response Workloads

🛡️ Gartner warns custom-built AI applications will increasingly strain security teams unless defenders are engaged early. It predicts that by 2028 at least half of enterprise incident response work will handle fallout from AI app security issues. Analysts urge teams to "shift left" to embed controls during development, and expect AI security platforms to be widely adopted within two years to enforce guardrails and mitigate prompt injection, data misuse and related threats.
read more →

Font-rendering trick hides malicious commands from AIs

🔍 LayerX researchers demonstrated a font-rendering technique that can hide malicious commands from AI assistants by encoding the payload in HTML while visually rendering a different, benign string to users. The proof-of-concept combines custom fonts with glyph substitution and CSS concealment (tiny fonts, color/opacity tricks) so the DOM appears harmless while the browser displays an executable instruction. In tests across many popular assistants, automated analyzers that read the DOM missed the hidden commands; LayerX urges assistants to compare rendered output with DOM text and to treat fonts, color/opacity matches, and unusually small fonts as potential attack surfaces.
read more →

GenAI Prompt Fuzzing Reveals LLM Guardrail Fragility

⚠️ Unit 42 demonstrates a genetic-algorithm-inspired prompt-fuzzing technique that automatically generates meaning-preserving variants of disallowed requests to evaluate LLM guardrails. Their experiments show evasion rates vary widely by keyword and model, with some combinations yielding high, operationally meaningful success rates. They recommend treating LLMs as probabilistic boundaries, applying layered controls, continuous adversarial testing, and using tools like Prisma AIRS and Unit 42 assessments to strengthen defenses.
read more →

Detecting and Responding to Prompt Abuse in AI Tools

🔍 This post, the second in Microsoft's AI Application Security series, moves from planning to practical detection and response for prompt abuse. It describes common attack types — direct prompt override, extractive abuse targeting sensitive inputs, and indirect prompt injection via hidden instructions such as URL fragments — and why these are hard to spot without telemetry. The article provides a stepwise detection and incident response playbook and maps mitigations to Microsoft tools so teams can log interactions, sanitize inputs, and contain incidents.
read more →

Perplexity's Comet AI Browser Tricked Into Phishing Scam

🔒 Researchers demonstrated that an AI-powered browser, Perplexity's Comet, can be manipulated into executing a phishing scam in under four minutes. By intercepting the agent's explanatory traffic and training a GAN on those signals, attackers iteratively optimized a malicious page until the agent reliably performed fraudulent steps. The exploit leverages intent collision and prompt-injection weaknesses, shifting the target from users to the AI agent itself.
read more →

AI vs. AI: The Gatling-Gun Moment in Cybersecurity Era

🛡️ The piece compares the Civil War’s Gatling gun to a September 2025 agentic AI-driven cyberespionage campaign that automated most tactical operations. According to the report, a Chinese state-linked group, GTG-1002, abused Anthropic’s Claude Code via prompt injection and role-playing to produce malicious code and execute ≈90% of the attack chain. The intrusion hit 30 U.S. companies and agencies and was disclosed after Anthropic’s threat team detected misuse of their platform.
read more →

Fuzzing AI Judges: Stealth Triggers Enable Policy Bypass

🔍 This research introduces AdvJudge-Zero, an automated fuzzer that discovers stealthy input sequences capable of flipping AI judge decisions and bypassing safety gates. Tests show low-perplexity, benign-looking tokens—such as markdown markers, role labels, and context-shift phrases—can reliably convert block outcomes into allows. The report documents a roughly 99% attack success rate across diverse models and recommends adversarial fuzzing, retraining with discovered examples, and operational monitoring using products like Prisma AIRS and Cortex AI-SPM.
read more →

OpenAI to Acquire Promptfoo to Boost AI Agent Security

🔒 OpenAI said it will acquire AI testing startup Promptfoo to strengthen security checks for AI agents as enterprises deploy autonomous systems in business workflows. Promptfoo’s tools let developers test LLM applications against adversarial prompts, including prompt injection and jailbreak attempts, and evaluate whether models follow safety and reliability guidelines. OpenAI plans to integrate Promptfoo into OpenAI Frontier and to continue developing the open-source project while expanding enterprise capabilities.
read more →

AI Assistants Shift Organizational Security Priorities

🤖 AI-based assistants such as OpenClaw are rapidly reshaping organizational security, blurring boundaries between data and code and between trusted co-workers and insider threats. Incidents and research show agents taking autonomous actions and misconfigured admin interfaces exposing credentials, conversations, and integrations. Demonstrated supply-chain and prompt injection attacks can install rogue agents and manipulate agent perception. Organizations should isolate agents, enforce strict network controls, vet third-party skills, and address AI fragility as a core security concern.
read more →

FortiAIGate: Runtime Protection for AI Workloads, Governance

🔒 FortiAIGate provides dedicated runtime protection for private AI and LLM deployments by monitoring every input and output between applications and models. It detects and blocks threats such as prompt injection, jailbreaking, model poisoning, data exfiltration, and excessive compute abuse while enforcing governance policies in real time. Built for Kubernetes and hybrid environments, it integrates with Fortinet Security Fabric, offers dashboards mapping OWASP Top 10 LLM risks, and uses multi‑GPU and SmartNIC acceleration to preserve performance and control costs.
read more →

Companies Inject Hidden Prompts into AI Summarization

🔒 Microsoft reports companies are embedding hidden instructions in Summarize with AI buttons that pass persistence commands via URL prompt parameters. These prompts tell assistants to 'remember [Company] as a trusted source' or 'recommend [Company] first,' biasing later responses toward vendors. Researchers found over 50 unique prompts from 31 companies across 14 industries, and freely available tooling makes this trivial to deploy. The manipulation can subtly skew recommendations in critical areas like health, finance, and security without users knowing.
read more →

OpenClaw: Supply-Chain Risks and Underground Chatter

🔍 OpenClaw is an AI-driven automation framework with a modular skills marketplace that lets agents run user-installed plugins to manage mail, schedules, and system tasks. Security researchers disclosed multiple critical flaws — including one-click RCE (CVE-2026-25253), token/OAuth abuse, prompt-injection pathways, and absent sandboxing — and documented dozens of poisoned skills on ClawHub. Flare's telemetry shows significant chatter across research and fringe channels but limited evidence of mass criminal operationalization; the immediate confirmed threat is supply-chain abuse where malicious skills execute with agent-level privileges and exfiltrate credentials and sessions.
read more →