All news with #model backdooring tag
Mon, November 17, 2025
AI-Driven Espionage Campaign Allegedly Targets Firms
🤖 Anthropic reported that roughly 30 organizations—including major technology firms, financial institutions, chemical companies and government agencies—were targeted in what it describes as an AI-powered espionage campaign. The company attributes the activity to the actor it calls GTG-1002, links the group to the Chinese state, and says attackers manipulated its developer tool Claude Code to largely autonomously launch infiltration attempts. Several security researchers have publicly questioned the asserted level of autonomy and criticized Anthropic for not publishing indicators of compromise or detailed forensic evidence.
Wed, November 5, 2025
Researchers Find ChatGPT Vulnerabilities in GPT-4o/5
🛡️ Cybersecurity researchers disclosed seven vulnerabilities in OpenAI's GPT-4o and GPT-5 models that enable indirect prompt injection attacks to exfiltrate user data from chat histories and stored memories. Tenable researchers Moshe Bernstein and Liv Matan describe zero-click search exploits, one-click query execution, conversation and memory poisoning, a markdown rendering bug, and a safety bypass using allow-listed Bing links. OpenAI has mitigated some issues, but experts warn that connecting LLMs to external tools broadens the attack surface and that robust safeguards and URL-sanitization remain essential.
Tue, October 28, 2025
Prisma AIRS 2.0: Unified Platform for Secure AI Agents
🔒 Prisma AIRS 2.0 is a unified AI security platform that delivers end-to-end visibility, risk assessment and automated defenses across agents, models and development pipelines. It consolidates Protect AI capabilities to provide posture and runtime protections for AI agents, model scanning and API-first controls for MLOps. The platform also offers continuous, autonomous red teaming and a managed MCP Server to embed threat detection into workflows.
Tue, October 28, 2025
Atlas Browser Flaw Lets Attackers Poison ChatGPT Memory
⚠️ Researchers at LayerX Security disclosed a vulnerability in OpenAI’s Atlas browser that allows attackers to inject hidden instructions into a user’s ChatGPT memory via a CSRF-style flow. An attacker lures a logged-in user to a malicious page, leverages existing authentication, and taints the account-level memory so subsequent prompts can trigger malicious behavior. LayerX reported the issue to OpenAI and advised enterprises to restrict Atlas use and monitor AI-driven anomalies. Detection relies on behavioral indicators rather than traditional malware artifacts.
Mon, October 20, 2025
Agentic AI and the OODA Loop: The Integrity Problem
🛡️ Bruce Schneier and Barath Raghavan argue that agentic AIs run repeated OODA loops—Observe, Orient, Decide, Act—over web-scale, adversarial inputs, and that current architectures lack the integrity controls to handle untrusted observations. They show how prompt injection, dataset poisoning, stateful cache contamination, and tool-call vectors (e.g., MCP) let attackers embed malicious control into ordinary inputs. The essay warns that fixing hallucinations is insufficient: we need architectural integrity—semantic verification, privilege separation, and new trust boundaries—rather than surface patches.
Thu, September 25, 2025
Malicious MCP Server Update Exfiltrated Emails to Developer
⚠️ Koi Security has reported that a widely used Model Context Protocol (MCP) implementation, Postmark MCP Server by @phanpak, introduced a malicious change in version 1.0.16 that silently copied emails to an external server. The package, distributed via npm and embedded into hundreds of developer workflows, had more than 1,500 weekly downloads. Users who installed v1.0.16 or later are advised to remove the package immediately and rotate any potentially exposed credentials.
Thu, August 28, 2025
Securing AI Before Times: Preparing for AI-driven Threats
🔐 At the Aspen US Cybersecurity Group Summer 2025 meeting, Wendi Whitmore urged urgent action to secure AI while defenders still retain a temporary advantage. Drawing on Unit 42 simulations that executed a full attack chain in as little as 25 minutes, she warned adversaries are evolving from automating old tactics to attacking the foundations of AI — targeting internal LLMs, training data and autonomous agents. Whitmore recommended adoption of a five-layer AI tech stack — Governance, Application, Infrastructure, Model and Data — combined with secure-by-design practices, strengthened identity and zero-trust controls, and investment in post-quantum cryptography to protect long-lived secrets and preserve resilience.
Mon, August 25, 2025
What 17,845 GitHub MCP Servers Reveal About Risk and Abuse
🛡️ VirusTotal ran a large-scale audit of 17,845 GitHub projects implementing the MCP (Model Context Protocol) using Code Insight powered by Gemini 2.5 Flash. The automated review initially surfaced an overwhelming number of issues, and a refined prompt focused on intentional malice marked 1,408 repos as likely malicious. Manual checks showed many flagged projects were demos or PoCs, but the analysis still exposed numerous real attack vectors—credential harvesting, remote code execution via exec/subprocess, supply-chain tricks—and recurring insecure practices. The post recommends treating MCP servers like browser extensions: sign and pin versions, sandbox or WASM-isolate them, enforce strict permissions and filter model outputs to remove invisible or malicious content.