< ciso
brief />
Tag Banner

All news with #ai red teaming tag

70 articles

Microsoft Open-Sources Rampart and Clarity for AI Safety

🔒 Microsoft has open-sourced two tools, Rampart and Clarity, intended to embed safety engineering into the AI agent development lifecycle rather than leaving it as a periodic checkpoint. Rampart converts red-team findings into structured, repeatable tests that can be automated in CI/CD pipelines and is built on top of PyRIT for continuous adversarial and benign scenario execution. Clarity targets an earlier phase, guiding engineers through structured conversations to clarify assumptions, expected behaviors, permissions and trust boundaries, storing outcomes as markdown in a .clarity-protocol/ directory for review. Both projects join Microsoft’s broader open-source agent governance stack to address risks such as prompt injection, unsafe tool use, privilege escalation, and unintended autonomous actions.
read more →

Microsoft Open-Sources RAMPART and Clarity for AI

🛡️ Microsoft has released two open-source tools, RAMPART and Clarity, to help developers test and clarify AI agent safety early in the development lifecycle. RAMPART is a Pytest-native framework for writing and running adversarial and benign safety tests against agents, building on prior work such as PyRIT. It evaluates test outcomes via simple adapters that connect an agent to the suite, while Clarity acts as a structured thinking partner to surface assumptions, explore failure modes, and guide design decisions before coding begins.
read more →

RAMPART and Clarity: Open Tools for Agent Safety Workflow

🔒 Microsoft has open-sourced two engineering tools—RAMPART and Clarity—to make agent safety a continuous part of development. RAMPART provides a pytest-style framework that brings red-team and adversarial tests into CI, evaluating tools invoked and side effects. Clarity is a structured design companion that captures problem statements, failure analyses, and decisions in a .clarity-protocol directory. Both aim to create living safety artifacts integrated into normal workflows.
read more →

Agentic AI Drives Surge in Mobile App Cyberattacks

📈 Digital.ai's 2026 Application Security Threat Report found that 87% of monitored customer-facing apps were attacked in 2026, up sharply from 55% in 2022. The firm says agentic AI has lowered the skill and time required for threat actors to inspect code, generate exploits and adapt malware. Financial services, automotive and medical device apps were most targeted, and iOS attacks have nearly closed the gap with Android.
read more →

When China's AI Catches Up: Mythos and Global Risks

🔒 Anthropic's Mythos Preview, shared last month with a limited set of security partners, has demonstrated the ability to autonomously find zero-day vulnerabilities across major operating systems and browsers. Anthropic paired the release with Project Glasswing and $100 million in usage credits to help defenders, but reports of unauthorized access and denied requests from Chinese entities have already emerged. The development challenges the assumption of a durable US lead and has injected cybersecurity into high-level US–China summit talks, prompting urgent questions about access, regulation, and international cooperation.
read more →

Pen Tests Reveal AI Flaws More Severe Than Legacy Bugs

🔒 Penetration testing shows AI and LLM deployments contain a disproportionate share of severe vulnerabilities. Cobalt’s State of Pentesting Report finds 32% of LLM findings rated high risk versus 13% for legacy enterprise tests, and only 38% of those high-risk LLM issues are remediated. Experts point to emerging attack surfaces — notably prompt injection, now OWASP’s top LLM risk — broader blast radii from model integrations, and fragmented ownership for fixes. Recommended countermeasures include threat modeling, red teaming, least-privilege access, strict output validation, and human approval gates for high-consequence actions.
read more →

Nutanix and Palo Alto Networks: Integration for Model Trust

🔒 Nutanix and Palo Alto Networks have integrated Prisma AIRS into the Nutanix Enterprise AI platform to embed automated AI model scanning and continuous red teaming directly into the MLOps pipeline. The integrated solution scans models at check-in, analyzes dependencies for known vulnerabilities and license issues, and validates provenance and file formats to block backdoors or unsafe execution paths before deployment. It also provides API-driven red teaming with a context-aware agent and a large, continuously updated attack library so teams can test resilience and prioritize business-relevant risks without complex setup.
read more →

Defending Against Attacks from Frontier AI Models: Readiness

🔒 A new generation of frontier AI models is changing how cyberattacks are developed, enabling speed, scale, and accessibility previously unseen. Early testing of advanced models, including Claude’s Mythos, shows they can identify code vulnerabilities, map attack paths, and generate working exploits with minimal effort. Organizations must treat these as fully AI-powered attacks and prioritize proactive readiness, detection, and mitigation strategies.
read more →

AI-Driven Vulnerability Discovery and Defensive Response

🤖 In the latest Adversary Universe podcast, CrowdStrike leaders discuss how AI is accelerating vulnerability discovery and could produce a rapid surge of new flaws — a potential 'vuln-pocalypse'. They urge prioritizing remediation based on active exploitation and prevalence in environments. CrowdStrike recommends leveraging AI for agentic red teaming, vulnerability scanning, and crowdsourced telemetry to detect post-exploitation behaviors. They point to Project Glasswing and OpenAI's Trusted Access for Cyber as examples of defense-focused collaboration.
read more →

Unit 42 Expands Frontier AI Defense with Armadin Partnership

🔒 Palo Alto Networks' Unit 42 is expanding its Frontier AI Defense service through a new partnership with Armadin, the offensive security firm founded by Kevin Mandia. The collaboration introduces an autonomous External AI Hyperattack Assessment that passively discovers internet-facing assets, then deploys a coordinated swarm of AI attack agents to validate exposures and exploit vulnerabilities in parallel. Unit 42 says this pressure-tested, decision-grade evidence accelerates remediation and helps organizations reduce AI-enabled external attack risk across cloud and perimeter environments.
read more →

AI Audit Finds 271 Vulnerabilities in Firefox 150 Release

🔍 The Firefox team used frontier AI models in partnership with Anthropic to scan the browser and fix latent security flaws. After earlier work with Opus 4.6 that produced 22 fixes for Firefox 148, an early evaluation of Claude Mythos Preview uncovered 271 vulnerabilities now addressed in Firefox 150. The team worked around the clock to triage and remediate the findings, and observers note this technology favors defenders—provided patches reach users quickly.
read more →

Anthropic Urges EPSS to Triage AI-Driven Vulnerabilities

🔍 Anthropic warns that its AI vulnerability-discovery system Mythos will sharply increase the pace and volume of software flaws, forcing defenders to prioritize what to fix. The company recommended using the probabilistic EPSS model (developed by Empirical Security and published through FIRST) to triage vulnerabilities—patching CISA’s KEV list first, then addressing CVEs above a chosen EPSS threshold. Empirical Security leaders emphasize that EPSS is machine-driven and already integrated across many vendor products.
read more →

Claude Mythos scrutiny: Project Glasswing's true impact

🔍 Anthropic's Claude Mythos — developed under Project Glasswing and currently trialed by select organizations — faces scrutiny after VulnCheck's analysis found limited publicly attributable results. The team identified 75 CVE entries mentioning Anthropic, 40 credited to its researchers, but only one explicitly tied to Glasswing (CVE-2026-4747), with several additional findings embargoed. Anthropic has signaled more transparency in July 2026. Security experts caution that Mythos' reported exploit success rates could still accelerate attacker capabilities and outpace corporate change controls.
read more →

Commercial AI Models Make Rapid Gains in Vulnerability

🔍 Forescout’s Verde Labs reports rapid progress across commercial, open-source and underground AI models in vulnerability research and exploit generation. In 2026 the firm found all tested models could complete end-to-end vulnerability research and about half could autonomously produce working exploits; top performers included Claude Opus 4.6 and Kimi K2.5. Using single prompts, the RAPTOR agentic framework and Verde Labs’ extensions, researchers discovered four zero-days in OpenNDS, demonstrating a lower barrier to discovery and a growing risk for organizations.
read more →

Mythos and the Limits of Private AI Security Control

🔍 Anthropic announced a restricted release of Claude Mythos Preview, an AI claimed to find and weaponize software vulnerabilities at unprecedented scale, and limited access to roughly 50 organizations under Project Glasswing. The company highlighted thousands of flaws across major operating systems and browsers, including decades-old bugs and a set of 181 usable Firefox attacks, far beyond its prior model's performance. Yet the disclosure omits key metrics—false-positive rates, unfiltered outputs, and broad audit access—raising concerns that withholding a powerful tool is not a substitute for transparency, independent review, and funded access for domain experts.
read more →

ATHR: AI Voice Agents Enable Fully Automated Vishing

🔊 A new platform called ATHR automates telephone-oriented attacks by combining AI voice agents and optional human operators to carry out vishing campaigns and harvest credentials across services including Google, Microsoft, and major crypto platforms. Researchers at Abnormal say ATHR bundles email templates, spoofing, WebRTC/Asterisk routing, and per-target customization into a dashboard that controls distribution, calls, and logging. The service is marketed on underground forums for $4,000 plus a commission and greatly lowers the skill barrier for attackers.
read more →

OpenAI Releases GPT-5.4-Cyber for Defensive Teams Now

🛡️ OpenAI has unveiled GPT-5.4-Cyber, a variant of its flagship GPT‑5.4 tuned for defensive cybersecurity use cases, and expanded its Trusted Access for Cyber (TAC) program to include thousands of authenticated individual defenders and hundreds of security teams. The company says the model is intended to help teams find, validate, and fix vulnerabilities faster while it iteratively strengthens safeguards to reduce dual‑use risks and resist jailbreaks and adversarial prompt injection. OpenAI highlighted its Codex Security agent, which it credits with contributing to the remediation of over 3,000 critical and high vulnerabilities, and framed the release as part of a broader shift toward continuous, developer‑integrated security feedback.
read more →

AISI Urges Cybersecurity Basics After Mythos Test Guidance

🔐 The UK’s AI Security Institute (AISI) evaluated Anthropic’s Claude Mythos Preview and found it can autonomously discover and exploit vulnerabilities in controlled tests when given network access. In a 32‑step simulated corporate attack the model completed the full sequence in 3 of 10 runs and averaged 22 of 32 steps, though performance varied. AISI stresses these cyber ranges are easier than real environments and recommended organisations strengthen basics — timely patching, robust access controls, secure configuration and comprehensive logging — while also exploring AI to bolster defensive capabilities.
read more →

Anthropic's Mythos Spurs Structural Cybersecurity Shift

⚠️A new Cloud Security Alliance (CSA) briefing warns that Anthropic's Claude Mythos (Preview) marks a structural shift in cybersecurity. The model can autonomously discover and exploit thousands of vulnerabilities and orchestrate attacks at speeds that compress discovery-to-weaponization from weeks to hours. The paper — informed by leading security figures — says Mythos is not an outlier and urges CISOs to build Mythos-ready programs, harden fundamentals, and elevate the issue to the board.
read more →

Anthropic’s Mythos Preview and Project Glasswing Risks

🔍 Anthropic's new Claude Mythos Preview and its Project Glasswing effort have focused industry attention on AI-driven cyberattack capabilities. Anthropic says it will not release the model publicly, citing the risk that it can automatically generate operational exploits, and is running the model against public and proprietary code to find and patch vulnerabilities before they can be weaponized. The announcement produced substantial PR impact, prompting rival vendors to echo similar caution. Security observers note defenders still hold an advantage—finding flaws is easier than turning them into attacks—but that margin is shrinking as models improve.
read more →