< ciso
brief />
Tag Banner

All news with #model poisoning tag

12 articles

Democratization of AI and the Rising Data Poisoning Threat

⚠️ Recent research shows that as few as 250 fabricated documents or images can measurably alter large language model behavior, making data poisoning accessible to non-experts. Online communities and influencers are already seeding false content that may be ingested during public-model training or fine-tuning. Organizations should maintain a clean 'gold' model, monitor input streams for anomalous patterns, and perform regular adversarial testing to detect drift and backdoors before deployment.
read more →

Microsoft Builds Scanner to Detect Backdoors in LLMs

🔍 Microsoft has developed a lightweight scanner to detect backdoors in open-weight large language models (LLMs) by evaluating three observable signals tied to internal model behavior. The tool extracts memorized content, isolates suspect substrings, and scores candidates with loss functions that formalize attention and output anomalies. The approach requires no additional training and runs across common GPT‑style models, but it needs access to model files and is best suited for trigger-based, deterministic backdoors.
read more →

Detecting Backdoored Language Models at Scale — Practical Scanner

🔍 Microsoft researchers released new findings and a practical scanner for detecting backdoors in open-weight language models. The study identifies three signatures — a distinctive “double triangle” attention pattern, leakage of poisoning training data through memorization, and trigger “fuzziness” — and uses them to reconstruct likely triggers without retraining. The scanner requires only forward passes, works on GPT-like models, and was validated across 270M–14B models and common fine-tuning regimes. The team notes limits: it needs model file access, favors deterministic backdoors, and should be used as part of layered defenses.
read more →

The AI Fix #84: Hungry ghost, data poisoning, Grok

🤖 In episode 84 of The AI Fix, hosts Graham Cluley and Mark Stockley survey a series of recent AI developments that raise practical and philosophical questions. They discuss reports that Grok will be integrated into Pentagon networks, a campaign by insiders to poison training data, and research showing small amounts of tainted data can sway model behavior. The episode also covers Google removing AI health overviews after risky outputs, findings that asking a model the same question twice can improve answers, and surprising advances in automated theorem solving.
read more →

Weird Generalizations and Inductive Backdoors in LLMs

⚠️ Recent research demonstrates that small amounts of narrow finetuning can produce broad, unexpected shifts in LLM behavior. The authors show weird generalization—models adopting outdated worldviews from bird-naming examples—and introduce inductive backdoors, where models learn triggers and behaviors via generalization. These effects enable persona hijacking and hard-to-detect misalignment.
read more →

Top Cyber Threats Targeting AI Systems and Infrastructure

🔒 AI systems face a growing range of attacks—from data poisoning and model poisoning during training to adversarial inputs, prompt injection, and model theft during deployment. These threats exploit weak data governance, supply chain dependencies, and inadequate monitoring. Security leaders should adopt proactive controls including provenance tracking, adversarial testing, rate limits, and routine red teaming. Frameworks like MITRE ATLAS can help map attacker techniques and prioritize defenses.
read more →

Picklescan Flaws Enable Malicious PyTorch Model Execution

⚠️ Picklescan, a Python pickle scanner, has three critical flaws that can be abused to execute arbitrary code when loading untrusted PyTorch models. Discovered by JFrog researchers, the issues — a file-extension bypass (CVE-2025-10155), a ZIP CRC bypass (CVE-2025-10156) and an unsafe-globals bypass (CVE-2025-10157) — let attackers present malicious models as safe. The vulnerabilities were responsibly disclosed on June 29, 2025 and fixed in Picklescan 0.0.31 on September 9; users should upgrade and review model-loading practices and downstream automation that accepts third-party models.
read more →

CIO’s First Principles: A Reference Guide to Securing AI

🔐 Enterprises must redesign security as AI moves from experimentation to production, and CIOs need a prevention-first, unified approach. This guide reframes Confidentiality, Integrity and Availability for AI, stressing rigorous access controls, end-to-end data lineage, adversarial testing and a defensible supply chain to prevent poisoning, prompt injection and model hijacking. Palo Alto Networks advocates embedding security across MLOps, real-time visibility of models and agents, and executive accountability to eliminate shadow AI and ensure resilient, auditable AI deployments.
read more →

Digital Health Needs Security at Its Core to Scale AI

🔒 The article argues that AI-driven digital health initiatives proved essential during COVID-19 but simultaneously exposed critical cybersecurity gaps that threaten pandemic preparedness. It warns that expansive data ecosystems, IoT devices and cloud pipelines multiply attack surfaces and that subtle AI-specific threats — including data poisoning, model inversion and adversarial inputs — can undermine public-health decisions. The author urges security by design, including zero-trust architectures, data provenance, encryption, model governance and cross-disciplinary drills so AI can deliver trustworthy, resilient public health systems.
read more →

The AI Fix #73: Gemini gambling, poisoning LLMs and fallout

🧠 In episode 73 of The AI Fix, hosts Graham Cluley and Mark Stockley explore a sweep of recent AI developments, from the rise of AI-generated content to high-profile figures relying on chatbots. They discuss research suggesting Google Gemini exhibits behaviours resembling pathological gambling and report on a Gemma-style model uncovering a potential cancer therapy pathway. The show also highlights legal and security concerns— including a lawyer criticised for repeated AI use, generals consulting chatbots, and techniques for poisoning LLMs with only a few malicious samples.
read more →

Agentic AI and the OODA Loop: The Integrity Problem

🛡️ Bruce Schneier and Barath Raghavan argue that agentic AIs run repeated OODA loops—Observe, Orient, Decide, Act—over web-scale, adversarial inputs, and that current architectures lack the integrity controls to handle untrusted observations. They show how prompt injection, dataset poisoning, stateful cache contamination, and tool-call vectors (e.g., MCP) let attackers embed malicious control into ordinary inputs. The essay warns that fixing hallucinations is insufficient: we need architectural integrity—semantic verification, privilege separation, and new trust boundaries—rather than surface patches.
read more →

AI Risks Push Integrity Protection to Forefront for CISOs

🔒 CISOs must now prioritize integrity protection as AI introduces new attack surfaces such as data poisoning, prompt injection and adversarial manipulation. Shadow AI — unsanctioned use of models and services — increases risks of data leakage and insecure integrations. Defenses should combine Security by Design, governance, transparency and compliance (e.g., GDPR, EU AI Act) to detect poisoned data and prevent model drift.
read more →