All news with #prompt injection attack tag

157 articles · page 2 of 8

June 25, 2026

Gaslight macOS implant uses AI prompt injection

🛡️ A new Rust-based macOS implant named Gaslight embeds a prompt-injection payload aimed at misleading AI-assisted analysis tools into aborting or refusing to analyze the sample. SentinelOne attributes the tool with high confidence to North Korea–aligned actors and notes its Telegram-based C2 implements an interactive shell with commands like shell, upload, and kill. The implant uses a LaunchAgent for persistence and includes a Base64-encoded Python stealer that harvests browser data, Terminal histories, Keychain contents, and system profiles before compressing and exfiltrating via Telegram.

SentinelOne Prompt Injection Attack North Korea-nexus Malware

June 24, 2026

AI browsers tricked into leaking credentials in demo

🔒 Researchers at LayerX demonstrated a technique called BioShocking that convinces AI-powered web browsers they are playing a game, causing them to abandon safety guardrails and exfiltrate user data. The team tested six agentic browsers and plugins, including ChatGPT Atlas, Perplexity's Comet and Anthropic's Claude extension, and in a proof-of-concept had each copy login credentials and send them to an attacker. LayerX recommended requiring user confirmation for account reads and adding context-aware flags to limit what agents can access.

ChatGPT Perplexity Anthropic Agent Security

June 24, 2026

macOS Gaslight backdoor uses prompt injection tactics

🛡️ SentinelLabs uncovered a North Korea-linked macOS backdoor, tracked as macOS.Gaslight, that embeds 38 fabricated system messages to manipulate AI-assisted malware triage. The Rust implant carries an infostealer and interactive backdoor that exfiltrates browser data, terminal histories and the macOS login keychain, using Telegram Bot API with certificate pinning for command and control. Researchers noted novel tradecraft including runtime staging of a standalone Python interpreter and self-scrubbing of the Telegram bot token from logs. SentinelLabs warned analysts to treat sample contents as adversarial input and to isolate hostile content from LLM-based tools.

SentinelOne North Korea-nexus Prompt Injection Attack Malware

June 24, 2026

Spyware embeds forbidden text to disrupt AI analysis

🛡️ A malware developer has begun embedding provocative text about nuclear and biological weapons inside large JavaScript block comments in spyware payloads to confuse AI-based scanners. The commented header is ignored at runtime but aims to trigger refusals or misclassification in naive LLM-powered triage systems that ingest file starts without isolating untrusted content. Traditional detection methods—YARA, entropy checks, AST parsing, and behavioral analysis—remain effective, but the technique is a practical anti-analysis tactic against weak AI-first pipelines.

Malware AI Security LLM Security Defense Evasion

June 22, 2026

Defending AI Memory: Microsoft’s Multi‑Layer Strategy

🔒 Microsoft outlines a defense-in-depth approach to protect AI memory across storage, retrieval, model interaction, and user control. The post explains how memory transforms AI from stateless tool to learning collaborator, increasing attack surface and enabling staged attacks that persist beyond initial prompts. It summarizes protections in M365 Copilot including prompt-injection classifiers, Task Adherence checks, tenant policy controls, unified compliance, and audit logging integrated with Defender and Sentinel.

Microsoft Prompt Injection Attack Microsoft Copilot Microsoft Sentinel

June 22, 2026

Microsoft fixes AutoGen Studio flaw enabling code execution

🛡️ Microsoft patched a vulnerability chain named AutoJack in AutoGen Studio that could allow a visiting webpage to coerce a developer’s AI agent into executing arbitrary commands on the host. AutoGen Studio is the graphical interface for Microsoft’s open-source AutoGen framework for multi-agent AI systems; the flaw was fixed during development and never shipped in a PyPI release. The issue affected developers who built from the main GitHub branch in a limited window and allowed attacker-supplied commands to be launched with the developer’s account privileges. Microsoft urges running AutoGen Studio only as a developer prototype in isolated, low-privilege environments and avoiding exposure to untrusted content.

Microsoft AutoGen Prompt Injection Attack Remote Code Execution

June 19, 2026

SearchLeak shows broader AI prompt injection risk

🔒 A proof-of-concept called SearchLeak demonstrated a prompt injection attack against Microsoft M365 Copilot Enterprise that tricks users into clicking crafted links to exfiltrate corporate data. Researchers combined three weaknesses in Copilot Search — including URL query parameters treated as natural language prompts — to leak sensitive content. Microsoft patched the server-side flaw, but the incident highlights risks when AI services access broad corporate assets and the need for render-time sanitization and stricter CSPs.

Microsoft Copilot Prompt Injection Attack Data Exfiltration AI Security

June 18, 2026

Attackers exploit trusted AI platforms and ads

🔐 Threat actors abused trusted services — Google Ads, GitLab Pages, and Claude’s shared-chat feature — to trick developers into executing malicious PowerShell and terminal commands via ClickFix social engineering. Researchers at TrendAI observed a six-wave campaign that funnelled over 2,000 victims from sponsored search results to malicious pages and then to weaponized Claude shared chats. By impersonating popular developer tools and brands, the attackers leveraged reputation stacking to make their lures appear legitimate and evade detection.

Google GitLab Anthropic ClickFix

June 15, 2026

Runtime signals to detect compromised AI agents

🛡️ In response to widespread prompt-injection risks, the article outlines runtime signals to detect compromised AI agents that possess the so-called lethal trifecta: access to private data, ingestion of untrusted content, and external communication ability. It argues that this trifecta is now the default for useful agents, so defenses must shift from architecture rules to behavioral, runtime detection. Recommended signals include instruction-following anomalies, unexpected tool-call sequences, low-bandwidth exfiltration channels, out-of-scope credential access, and suspicious memory writes.

AI Runtime Security Agent Security Prompt Injection Attack Tool Abuse

June 12, 2026

Agentjacking: AI coding agents tricked into execution

🛡️ Cybersecurity researchers at Tenet Security disclosed a new attack class called Agentjacking that tricks AI coding agents into executing arbitrary code. The exploit leverages Sentry's public DSN and its MCP interaction to inject crafted error events, which agents like Claude Code and Cursor interpret as trusted resolution steps. Successful exploitation can expose sensitive data and run code with developers' privileges.

Claude AI Agent Hijacking Agent Security Prompt Injection Attack

June 12, 2026

Study: Prompt Injection Undermines AI Web Agents

🔍 New research finds current AI web agents largely fail to defend against prompt injection attacks. The StakeBench benchmark tested GPT‑5 and Gemini‑powered agents across realistic web scenarios, revealing high success rates for both direct and indirect injections and exposing failure modes like stealthy parasitism and misaligned disruption. Results show vulnerabilities vary by stakeholder and agent architecture.

Prompt Injection Attack AI Agent Hijacking Gemini ChatGPT

June 11, 2026

OpenClaw AI Agent Vulnerabilities and Mitigations

🛡️ Two security teams demonstrated attacks against OpenClaw, where hidden instructions in shared contacts, vCards, and location pins or ordinary-looking emails caused the agent to execute attacker-controlled code or exfiltrate sensitive data. Imperva found a message-object prompt-injection flaw that OpenClaw patched in version 2026.4.23, while Varonis showed social-engineering 'agent phishing' that requires architectural controls rather than a simple patch. Operators are urged to update, restrict outbound actions, and treat agents as junior employees needing human oversight.

Agentic AI Prompt Injection Attack Indirect Prompt Injection Agent Security

June 10, 2026

Securing AI Agents as Enterprise Workforce

🛡️ An enterprise sales team built an AI agent to manage renewals; the agent reads emails, queries CRM data, drafts responses, and updates records. This workflow combines private data, untrusted input, and external communication, changing the security model. Traditional controls like IAM and DLP still matter but are insufficient alone. Runtime, context-aware controls that inspect prompts, outputs, and tool calls are required to prevent prompt injection, data exfiltration, and unsafe actions.

Agent Security Agentic AI Prompt Injection Attack Tool Abuse

June 8, 2026

OpenAI adds Lockdown Mode and session auditing

🔒 OpenAI has rolled out two new security controls for ChatGPT: Lockdown Mode and Active Sessions. Lockdown Mode restricts outbound network access to prevent data exfiltration via prompt injection, at the cost of disabling live connectors and certain features. Active Sessions gives users visibility into and control over signed-in devices, with the ability to end single or all sessions. Both controls target account security and sensitive-data use cases, though SSO accounts and some logins remain unsupported.

OpenAI ChatGPT Prompt Injection Attack

June 8, 2026

Prompt injection remains an unsolved architectural problem

🛡️ Ariel Fogel warned at Infosecurity Europe 2026 that prompt injection is an unresolved architectural issue threatening AI development. He explained that LLMs treat inputs as a single token stream, preventing reliable privilege separation between system prompts, user inputs and agent-retrieved content. With agents gaining tool access, successful injections can escalate from bad outputs to real-world actions, outpacing traditional governance and controls.

Prompt Injection Attack LLM Security Agentic AI

June 6, 2026

OpenAI introduces Lockdown Mode to limit ChatGPT tools

🔒 OpenAI has started rolling out a new Lockdown Mode for eligible ChatGPT personal accounts to reduce the risk of data exfiltration from prompt injection attacks. The optional security setting restricts capabilities that can connect to the web or external services, including live web browsing, image support, agent mode, deep research, Canvas networking, and file downloads. Lockdown Mode is available across Free, Go, Plus, Pro, and self-serve ChatGPT Business plans but cannot be used simultaneously with Developer Mode. OpenAI warns the feature reduces but does not eliminate exfiltration risk and also launched enhanced account session management to help detect and terminate unauthorized access.

ChatGPT Prompt Injection Attack AI Security

June 4, 2026

Anthropic Claude Code Action flaw risk to repos

🔒 A researcher discovered a vulnerability in Anthropic's Claude Code GitHub Action that allowed takeover of public repositories via a single opened GitHub issue. Anthropic patched the core bypass in January and released fixes in claude-code-action v1.0.94, rating the issue 7.8 under CVSS v4.0 and issuing a bounty. The flaw arose from overly permissive triggers that trusted actors ending in "[bot]" and example workflows allowing non-write users, enabling indirect prompt injection to exfiltrate environment secrets and OIDC credentials. Administrators should update to v1.0.94, audit workflows for untrusted inputs, and remove unnecessary permissions and tools to prevent exfiltration.

Anthropic GitHub Actions Prompt Injection Attack Indirect Prompt Injection

June 3, 2026

Gemini notification injection risk on Android devices

🔔 A SafeBreach researcher demonstrated that a single malicious notification from apps like WhatsApp, Slack, SMS, Signal, Instagram, or Messenger could hijack Google Gemini's voice assistant on Android. The technique, called Fake Context Alignment, let notifications be treated as executable context, enabling fake replies, app launches, smart-home control, and even persistent memory poisoning. Google patched the issue server-side after being notified; Android users can disable Gemini's notification reading to mitigate exposure.

Google Prompt Injection Attack

June 2, 2026

When AI Support Workflows Become an Authorization Risk

🔒 Reporting suggests attackers used Meta’s AI support chatbot to change recovery emails on high-profile Instagram accounts, leading to notable takeovers. The core issue isn’t just prompt injection or a model jailbreak but that the AI operated within a sensitive account recovery workflow with insufficient independent verification. Organizations must treat AI-driven support actions as part of the security boundary and constrain authority, permissions, and verification around such agents.

Meta Account Takeover AI Security Authentication Bypass

May 29, 2026

ChatGPhish vulnerability turns ChatGPT into phishing surface

🛡️ Cybersecurity researchers disclosed a vulnerability dubbed ChatGPhish that exploits ChatGPT's trust in Markdown links and images to perform prompt injections and enable phishing. The flaw causes the assistant to auto-fetch attacker-hosted images and render malicious links and QR codes inside the trusted UI, potentially leaking client metadata like IP and User-Agent. The technique highlights summarization as an adversarial surface that can convert benign web pages into phishing vectors.

ChatGPT Prompt Injection Attack Phishing