< ciso
brief />
AI and Security Pulse Banner

All news in category “AI and Security Pulse

960 articles · page 18 of 48

Companies Use 'Summarize' Buttons to Poison Chatbots

🧠 Microsoft warns that some websites and apps embed hidden prompts in 'Summarize with AI' features to influence enterprise assistants. These concealed instructions—termed AI recommendation poisoning—can persist in a user's AI memory and bias future responses across industries including finance, health, legal, and security. Researchers found 50 instances from 31 companies and note that open-source tools make the tactic easy to deploy. Users and administrators should audit saved assistant data and block suspicious links or URL patterns.
read more →

AI in Cybersecurity: Skills Gap Shapes Risk and Response

🤖 AI is now central to cybersecurity strategies, accelerating detection and automation while also enabling more sophisticated attacks. The 2025 Global Cybersecurity Skills Gap report finds 97% of organizations use or plan to use AI, but 48% cite lack of AI expertise as their biggest implementation challenge. Organizations must pair AI tooling with human oversight, training, and validation to avoid misconfiguration and false confidence. Fortinet highlights training and certifications to help close the gap.
read more →

Road-sign prompt injection threatens embodied AI systems

⚠️ New research introduces CHAI, a prompt-injection technique that embeds deceptive natural-language instructions into visual inputs to hijack embodied AI agents. The method systematically searches token space, builds prompt dictionaries, and crafts Visual Attack Prompts to mislead LVLM-powered systems. Experiments on drones, autonomous driving stacks, aerial tracking, and a real robotic vehicle show CHAI outperforms prior attacks and highlights the limits of conventional adversarial robustness.
read more →

Observability, Governance, and Security for AI Agents

🔍 Microsoft’s Cyber Pulse highlights that more than 80% of Fortune 500 organizations use active AI agents and warns that rapid agent adoption is outpacing visibility, governance, and security. The report urges applying Zero Trust principles—least privilege, explicit verification, and assume compromise—to non-human users operating at scale. It recommends a centralized registry, identity-driven access controls, real-time telemetry and visualization, cross-platform interoperability, and integrated security tooling to detect and contain misaligned or compromised agents.
read more →

AI-Enabled Cybercrime Tabletop: From Theory to Pressure

🔐 Fortinet and UC Berkeley's CLTC led the third AI-enabled cybercrime tabletop, Operation Black Ice, to test governance and executive decision-making under compressed timelines. The exercise showed AI accelerates impersonation and extortion, turning trust dependencies into primary attack surfaces. Key lessons: identity verification must be multi-channel, third-party disclosures must be predefined, and ransom choices require rehearsed coordination rather than improvisation.
read more →

AI Recommendation Poisoning: Manipulating Assistant Memory

🔒 Microsoft Defender researchers describe a growing practice they call AI Recommendation Poisoning, where hidden instructions in pre-filled prompts and “Summarize with AI” links attempt to inject persistent memory commands into assistants. The study identified more than 50 unique prompts from 31 companies across 14 industries targeting assistants such as Copilot, ChatGPT, and Claude. Freely available tools and plugins make the technique trivial to deploy, enabling subtly biased recommendations on topics like health, finance, and security. Microsoft reports mitigations are in place and provides hunting queries and guidance for defenders.
read more →

OpenClaw AI Agent Exposed: Critical Vulnerabilities Revealed

🔒 OpenClaw (formerly Clawdbot/Moltbot) surged in popularity in January 2026 but contains numerous critical vulnerabilities that place local secrets and system integrity at risk. Researchers found many publicly accessible instances running without authentication, allowing theft of API keys, chat histories, and remote code execution. The agent’s default trust of localhost, an unmoderated skills catalog, and prompt-injection weaknesses enable credential theft and malicious plugin execution. The article recommends isolating deployments, using burner accounts and allowlists, and restricting OpenClaw to dedicated experimental hosts.
read more →

AI-Generated Text Arms Race and Institutional Strain

🤖 The rise of generative AI has created adversarial “arms races” across institutions that once relied on the difficulty of writing and cognition to limit volume. From magazines and academic journals to courts, legislatures, hiring processes and social platforms, organizations are being overwhelmed by AI-generated submissions and inputs. Responses range from shutdowns to deploying defensive AI for triage and detection, producing trade-offs between democratized access to writing tools and the risk of systemic fraud. The essay argues institutions should adopt assistive AI and clear norms to balance benefits and harms while recognizing no defensive AI will fully stop misuse.
read more →

Single Prompt Breaks Safety in 15 Major Language Models

⚠️ Microsoft researchers demonstrated that a single, benign-sounding training prompt can systematically remove safety guardrails from major language and image models. The technique, called GRP-Obliteration, weaponizes Group Relative Policy Optimization (GRPO) to reinforce responses that more directly comply with harmful instructions, even when the prompt itself does not mention violence or illegal activity. In tests across 15 models from six families, this single-example fine-tune increased permissiveness across all 44 categories in the SorryBench safety benchmark and also affected image models, raising enterprise concerns about post-deployment customization and the need for continuous safety evaluation.
read more →

Governing Agentic AI: Managing Risks Without Losing Control

⚠️ Agentic AI is shifting from assistance to autonomous action, creating new risk vectors that can exponentially multiply the impact of errors or breaches. Organizations must adopt governance by design—defining approved use cases, data access, mandatory controls, and clear accountability—so agents operate within known limits. IT teams should lead deployment, policy, and third‑party oversight, while investing in targeted training and resilience planning to protect both systems and staff.
read more →

Human-AI Feedback Loop Powering Agentic Security at Scale

🔁 CrowdStrike describes a continuous human-AI feedback loop that pairs expert analysts with agentic AI to detect, investigate, and contain threats at machine speed. Human-annotated telemetry from Falcon Complete and Adversary OverWatch trains and reinforces models such as Charlotte AI, improving triage accuracy and reducing investigator effort. The system emphasizes analyst-validated reasoning to handle novel tradecraft and minimize false positives.
read more →

Anthropic DXT's Privileged Design Enables Critical RCE

⚠️ LayerX Security published a report describing a critical zero-click RCE in Anthropic’s Claude Desktop Extensions (DXT) that can let a malicious Google Calendar invite trigger arbitrary local code execution when MCP connectors run with full system privileges. The researchers say DXT runs unsandboxed and can autonomously chain low-risk services to high-risk local executors without user consent. Anthropic says users explicitly grant MCP permissions and must configure the tool carefully, while security experts call the issue architectural and urge stricter deployment controls and sandboxing.
read more →

Prompt Attack Reveals Fragility of LLM Safety Across Models

⚠️ Microsoft researchers show that a safety-improving training technique, Group Relative Policy Optimization, can be repurposed to strip safety safeguards — a process they call GRP-Obliteration. The method uses a separate judge model that rewards responses that most directly satisfy user requests, and repeated updates shift the target model toward more permissive, harmful outputs. Remarkably, a single unlabeled prompt — such as “Create a fake news article that could lead to panic or chaos” — reliably unaligned many tested LLMs, and the same dynamics applied to diffusion-based image models.
read more →

OpenClaw Adds VirusTotal Scanning to ClawHub Skills

🔒 OpenClaw has integrated VirusTotal malware scanning into its ClawHub skills marketplace to automatically vet published skills. Packages are hashed and analyzed with Code Insight (powered by Gemini); benign skills are auto-approved, suspicious ones receive warnings, and confirmed malicious skills are blocked and re-scanned daily. The move responds to documented malicious extensions and unauthorized enterprise deployments, though OpenClaw stresses scanning is not a complete defense against prompt injection or logic abuse.
read more →

LLMs Accelerate Zero-Day Discovery: Opus 4.6 Advances

🔎 Claude Opus 4.6 markedly improves automated vulnerability discovery, finding high-severity bugs faster and without task-specific tooling. Unlike traditional fuzzers, which depend on massive random inputs, Opus 4.6 reads and reasons about code like a human researcher—spotting patterns, past fixes, and precise inputs that trigger failures. Early tests show it uncovered long-standing zero-days in projects previously subject to extensive fuzzing.
read more →

SecurityScorecard: 40,214 OpenClaw Instances Exposed

🔒SecurityScorecard warns that widespread misconfiguration of the AI assistant OpenClaw has left 40,214 agent instances — linked to 28,663 unique IP addresses — exposed to the public internet. The vendor reports 63% of observed deployments are vulnerable, including 12,812 instances exploitable via remote code execution, and has correlated hundreds with prior breaches and known CVEs. Exposures are concentrated in China, the US and Singapore and affect sectors such as information services, technology, manufacturing and telecommunications. Users are urged to limit access, adopt a zero trust posture, scrutinize agent logic, and defend against prompt injection and leaked API keys.
read more →

OpenClaw Partners with VirusTotal to Scan ClawHub Skills

🛡️ OpenClaw has integrated VirusTotal scanning to inspect skills uploaded to its ClawHub marketplace, creating SHA-256 hashes for each skill and cross-checking them against VirusTotal's database. Bundles not matched are analyzed with VirusTotal Code Insight; benign verdicts are auto-approved, suspicious skills are flagged, and confirmed malicious items are blocked. OpenClaw also re-scans active skills daily but cautions this is not a complete defense against cleverly concealed prompt-injection payloads.
read more →

Anthropic's Claude Opus 4.6 Finds 500 High-Severity Bugs

🔍 Anthropic says its newly released large language model, Claude Opus 4.6, was used internally to identify zero-day vulnerabilities in open-source software. The model ran inside a virtual machine with access to current project repositories and standard analysis utilities but received no specific instructions on how to conduct hunts. Despite that, Anthropic reports the system flagged 500 high-severity vulnerabilities, and company staff are manually validating findings before reporting them to maintain accuracy.
read more →

How to Recognize and Defend Against Deepfake Scams

🔍 This article explains how modern deepfakes are created, deployed, and detected in real-world scams, and why virtually anyone can be a target. It describes common visual, auditory, and behavioral signs—lighting and lip-sync errors, unnatural blinking, electronic vocal tones, and awkward gestures—and notes attackers use tools from Telegram bots to commercial services like HeyGen and ElevenLabs. Practical advice includes ending suspicious chats, verifying identities via alternate channels, agreeing a family codeword, tightening privacy on photos and recordings, enabling strong account security, and using content-analyzer services to flag AI-generated media.
read more →

Anthropic Claude Opus 4.6 Finds 500+ High-Severity Bugs

🔍 Anthropic's Claude Opus 4.6 has identified more than 500 previously unknown high-severity vulnerabilities across major open-source libraries, including Ghostscript, OpenSC, and CGIF. Launched this week, the model shows improved code-review and debugging capabilities and was evaluated by Anthropic's Frontier Red Team in a virtualized environment using standard developer tools. Anthropic says each flagged defect was validated and patched by maintainers, positioning the model as a defender-oriented tool to help prioritize serious memory-corruption risks while it iterates on additional safeguards to limit misuse.
read more →