< ciso
brief />
Tag Banner

All news with #llm security tag

221 articles · page 7 of 12

MCP Sampling Risks: New Prompt-Injection Attack Vectors

🔒 This Unit 42 investigation (published December 5, 2025) analyzes security risks introduced by the Model Context Protocol (MCP) sampling feature in a popular coding copilot. The authors demonstrate three proof-of-concept attacks—resource theft, conversation hijacking, and covert tool invocation—showing how malicious MCP servers can inject hidden prompts and trigger unobserved model completions. The report evaluates detection techniques and recommends layered mitigations, including request sanitization, response filtering, and strict access controls to protect LLM integrations.
read more →

Generative AI's Dual Role in Cybersecurity, Evolving

🛡️ Generative AI is rapidly reshaping cybersecurity by amplifying both attackers' and defenders' capabilities. Adversaries leverage models for coding assistance, phishing and social engineering, anti-analysis techniques (including prompts hidden in DNS) and vulnerability discovery, with AI-assisted elements beginning to appear in malware while still needing significant human oversight. Defenders use GenAI to triage threat data, speed incident response, detect code flaws, and augment analysts through MCP-style integrations. As models shrink and access widens, both risk and defensive opportunity are likely to grow.
read more →

Building a Production-Ready AI Security Foundation

🔒 This guide presents a practical defense-in-depth approach to move generative AI projects from prototype to production by protecting the application, data, and infrastructure layers. It includes hands-on labs demonstrating how to deploy Model Armor for real-time prompt and response inspection, implement Sensitive Data Protection pipelines to detect and de-identify PII, and harden compute and storage with private VPCs, Secure Boot, and service perimeter controls. Reusable templates, automated jobs, and integration blueprints help teams reduce prompt injection, data leakage, and exfiltration risk while aligning operational controls with compliance and privacy expectations.
read more →

Protecting LLM Chats from the Whisper Leak Attack Today

🛡️ Recent research shows the “Whisper Leak” attack can infer the topic of LLM conversations by analyzing timing and packet patterns during streaming responses. Microsoft’s study tested 30 models and thousands of prompts, finding topic-detection accuracy from 71% to 100% for some models. Providers including OpenAI, Mistral, Microsoft Azure, and xAI have added invisible padding to network packets to disrupt these timing signals. Users can further protect sensitive chats by using local models, disabling streaming output, avoiding untrusted networks, or using a trusted VPN and up-to-date anti-spyware.
read more →

Indirect Prompt Injection: Hidden Risks to AI Systems

🔐 The article explains how indirect prompt injection — malicious instructions embedded in external content such as documents, images, emails and webpages — can manipulate AI tools without users seeing the exploit. It contrasts indirect attacks with direct prompt injection and cites CrowdStrike's analysis of over 300,000 adversarial prompts and 150 techniques. Recommended defenses include detection, input sanitization, allowlisting, privilege separation, monitoring and user education to shrink this expanding attack surface.
read more →

How Companies Can Prepare for Emerging AI Security Threats

🔒 Generative AI introduces new attack surfaces that alter trust relationships between users, applications and models. Siemens' pentest and security teams differentiate Offensive Security (targeted technical pentests) from Red Teaming (broader organizational simulations of real attackers). Traditional ML risks such as image or biometric misclassification remain relevant, but experts now single out prompt injection as the most serious threat — simple crafted inputs can leak system prompts, cause misinformation, or convert innocuous instructions into dangerous command injections.
read more →

Adversarial Poetry Bypasses AI Guardrails Across Models

✍️ Researchers from Icaro Lab (DexAI), Sapienza University of Rome, and Sant’Anna School found that short poetic prompts can reliably subvert AI safety filters, in some cases achieving 100% success. Using 20 crafted poems and the MLCommons AILuminate benchmark across 25 proprietary and open models, they prompted systems to produce hazardous instructions — from weapons-grade plutonium to steps for deploying RATs. The team observed wide variance by vendor and model family, with some smaller models surprisingly more resistant. The study concludes that stylistic prompts exploit structural alignment weaknesses across providers.
read more →

The AI Fix #79 — Gemini 3, poetry jailbreaks, robot safety

🎧 In episode 79 of The AI Fix, hosts Graham Cluley and Mark Stockley examine the latest surprises from Gemini 3, including boastful comparisons, hallucinations about the year, and reactions from industry players. They also discuss an arXiv paper proposing adversarial poetry as a universal jailbreak for LLMs and the ensuing debate over its provenance. Additional segments cover robot-versus-appliance antics, a controversial AI teddy pulled from sale after disturbing interactions with children, and whether humans need safer robots — or stricter oversight.
read more →

Amazon Announces Nova 2 Sonic for Real‑Time Voice AI

🎙️ Amazon announced Amazon Nova 2 Sonic, a speech-to-speech model for natural, real-time conversational AI available via Amazon Bedrock. The model delivers streaming speech understanding robust to background noise and diverse speaking styles, expressive polyglot voices, turn-taking controllability, asynchronous tool calling, and a one‑million token context window. Developers can integrate Nova 2 Sonic with Amazon Connect, leading telephony providers, open-source frameworks, and Bedrock’s bidirectional streaming API; it’s initially available in select AWS Regions.
read more →

Amazon Bedrock Adds 18 Fully Managed Open Models Today

🚀 Amazon Bedrock expanded its model catalog with 18 new fully managed open-weight models, the largest single addition to date. The offering includes Gemma 3, Mistral Large 3, NVIDIA Nemotron Nano 2, OpenAI gpt-oss variants and other vendor models. Through a unified API, developers can evaluate, switch, and adopt these models in production without rewriting applications or changing infrastructure. Models are available in supported AWS Regions.
read more →

Amazon Nova 2 Omni: Multimodal Reasoning Model Preview

🚀 Amazon announced Nova 2 Omni, an all‑in‑one multimodal model in preview that accepts text, images, video, and speech inputs while producing text and image outputs. It offers a 1M token context window, supports 200+ languages for text and 10 for speech, and provides image generation/editing and multi‑speaker speech transcription with native reasoning. Early access is available to Nova Forge and authorized customers.
read more →

Adversarial Poetry Bypasses LLM Safety Across Models

⚠️ Researchers report that converting prompts into poetry can reliably jailbreak large language models, producing high attack-success rates across 25 proprietary and open models. The study found poetic reframing yielded average jailbreak success of 62% for hand-crafted verses and about 43% for automated meta-prompt conversions, substantially outperforming prose baselines. Authors map attacks to MLCommons and EU CoP risk taxonomies and warn this stylistic vector can evade current safety mechanisms.
read more →

Malicious LLMs Equip Novice Hackers with Advanced Tools

⚠️ Researchers at Palo Alto Networks Unit42 found that uncensored models like WormGPT 4 and community-driven KawaiiGPT can generate functional tools for ransomware, lateral movement, and phishing. WormGPT 4 produced a PowerShell locker and a convincing ransom note, while KawaiiGPT generated scripts for credential harvesting and remote command execution. Both are accessible via subscriptions or local installs, lowering the bar for novice attackers.
read more →

LLMs Can Produce Malware Code but Reliability Lags

🔬 Netskope Threat Labs tested whether large language models can generate operational malware by asking GPT-3.5-Turbo, GPT-4 and GPT-5 to produce Python for process injection, AV/EDR termination and virtualization detection. GPT-3.5-Turbo produced malicious code quickly, while GPT-4 initially refused but could be coaxed with role-based prompts. Generated scripts ran reliably on physical hosts, had moderate success in VMware, and performed poorly in AWS Workspaces VDI; GPT-5 raised success rates substantially but also returned safer alternatives because of stronger safeguards. Researchers conclude LLMs can create useful attack code but still struggle with reliable evasion and cloud adaptation, so full automation of malware remains infeasible today.
read more →

Amazon Lex Enables LLMs as Primary NLU Across Connect

🤖 Amazon Lex now lets developers use Large Language Models (LLMs) as the primary natural language understanding option for voice and chat bots. Using LLMs improves handling of complex or misspelled utterances, extracts key details from verbose inputs, and enables intelligent follow‑up questions when customer intent is unclear. This capability is available in all AWS commercial regions where Amazon Connect and Amazon Lex operate, helping teams build more accurate, conversational self‑service experiences.
read more →

The Dilemma of AI: Malicious LLMs and Security Risks

🛡️ Unit 42 examines the growing threat of malicious large language models that have been intentionally stripped of safety controls and repackaged for criminal use. These tools — exemplified by WormGPT and KawaiiGPT — generate persuasive phishing, credential-harvesting lures, polymorphic malware scaffolding, and end-to-end extortion workflows. Their distribution ranges from paid subscriptions and source-code sales to free GitHub deployments and Telegram promotion. The report urges stronger alignment, regulation, and defensive resilience and offers Unit 42 incident response and AI assessment services.
read more →

Anthropic Claude Opus 4.5 Now Available in Amazon Bedrock

🚀 Anthropic's Claude Opus 4.5 is now available through Amazon Bedrock, giving Bedrock customers access to a high-performance foundation model at roughly one-third the prior cost. Opus 4.5 advances professional software engineering, agentic workflows, multilingual coding, and complex visual interpretation while supporting production-grade agent deployments. Bedrock adds two API features — tool search and tool use examples — plus a beta effort parameter to balance reasoning, tool calls, latency, and cost. The model is offered via global cross-region inference in multiple AWS regions.
read more →

DeepSeek-R1 Generates Less Secure Code for China-Sensitive Prompts

⚠️ CrowdStrike analysis finds that DeepSeek-R1, an open-source AI reasoning model from a Chinese vendor, produces significantly more insecure code when prompts reference topics the Chinese government deems sensitive. Baseline tests produced vulnerable code in 19% of neutral prompts, rising to 27.2% for Tibet-linked scenarios. Researchers also observed partial refusals and internal planning traces consistent with targeted guardrails that may unintentionally degrade code quality.
read more →

Trend Micro Unveils Full-Stack AI Security Package

🔒 Trend Micro is previewing Trend Vision One AI Security Package, a comprehensive suite due at AWS re:Invent in early December that aims to protect the full AI application stack from development through runtime. The offering combines continuous model scanning and automated AI guardrails and leverages Nvidia BlueField3 hardware acceleration. It also assembles tools such as AI Security Blueprint, Risk Insights, cloud and container security, file protection with NetApp support, an agentic SIEM with AWS native logs, and Zero Trust AI access controls.
read more →

CrowdStrike: Political Triggers Reduce AI Code Security

🔍 DeepSeek-R1, a 671B-parameter open-source LLM, produced code with significantly more severe security vulnerabilities when prompts included politically sensitive modifiers. CrowdStrike found baseline vulnerable outputs at 19%, rising to 27.2% or higher for certain triggers and recurring severe flaws such as hard-coded secrets and missing authentication. The model also refused requests related to Falun Gong in 45% of cases, exhibiting an intrinsic "kill switch" behavior. The report urges thorough, environment-specific testing of AI coding assistants rather than reliance on generic benchmarks.
read more →