All news with #ai red teaming tag

88 articles · page 2 of 5

May 20, 2026

RAMPART and Clarity: Open Tools for Agent Safety Workflow

🔒 Microsoft has open-sourced two engineering tools—RAMPART and Clarity—to make agent safety a continuous part of development. RAMPART provides a pytest-style framework that brings red-team and adversarial tests into CI, evaluating tools invoked and side effects. Clarity is a structured design companion that captures problem statements, failure analyses, and decisions in a .clarity-protocol directory. Both aim to create living safety artifacts integrated into normal workflows.

Microsoft Agent Security Agentic AI AI Red Teaming

May 19, 2026

Agentic AI Drives Surge in Mobile App Cyberattacks

📈 Digital.ai's 2026 Application Security Threat Report found that 87% of monitored customer-facing apps were attacked in 2026, up sharply from 55% in 2022. The firm says agentic AI has lowered the skill and time required for threat actors to inspect code, generate exploits and adapt malware. Financial services, automotive and medical device apps were most targeted, and iOS attacks have nearly closed the gap with Android.

Agentic AI AI Red Teaming Mobile Security Malware

May 13, 2026

When China's AI Catches Up: Mythos and Global Risks

🔒 Anthropic's Mythos Preview, shared last month with a limited set of security partners, has demonstrated the ability to autonomously find zero-day vulnerabilities across major operating systems and browsers. Anthropic paired the release with Project Glasswing and $100 million in usage credits to help defenders, but reports of unauthorized access and denied requests from Chinese entities have already emerged. The development challenges the assumption of a durable US lead and has injected cybersecurity into high-level US–China summit talks, prompting urgent questions about access, regulation, and international cooperation.

Anthropic Zero-Day Exploitation China-nexus AI Red Teaming

May 8, 2026

Pen Tests Reveal AI Flaws More Severe Than Legacy Bugs

🔒 Penetration testing shows AI and LLM deployments contain a disproportionate share of severe vulnerabilities. Cobalt’s State of Pentesting Report finds 32% of LLM findings rated high risk versus 13% for legacy enterprise tests, and only 38% of those high-risk LLM issues are remediated. Experts point to emerging attack surfaces — notably prompt injection, now OWASP’s top LLM risk — broader blast radii from model integrations, and fragmented ownership for fixes. Recommended countermeasures include threat modeling, red teaming, least-privilege access, strict output validation, and human approval gates for high-consequence actions.

LLM Security Prompt Injection Attack AI Red Teaming Vulnerability Management

May 7, 2026

Nutanix and Palo Alto Networks: Integration for Model Trust

🔒 Nutanix and Palo Alto Networks have integrated Prisma AIRS into the Nutanix Enterprise AI platform to embed automated AI model scanning and continuous red teaming directly into the MLOps pipeline. The integrated solution scans models at check-in, analyzes dependencies for known vulnerabilities and license issues, and validates provenance and file formats to block backdoors or unsafe execution paths before deployment. It also provides API-driven red teaming with a context-aware agent and a large, continuously updated attack library so teams can test resilience and prioritize business-relevant risks without complex setup.

Nutanix Palo Alto Networks AI Red Teaming Model Security

May 5, 2026

Defending Against Attacks from Frontier AI Models: Readiness

🔒 A new generation of frontier AI models is changing how cyberattacks are developed, enabling speed, scale, and accessibility previously unseen. Early testing of advanced models, including Claude’s Mythos, shows they can identify code vulnerabilities, map attack paths, and generate working exploits with minimal effort. Organizations must treat these as fully AI-powered attacks and prioritize proactive readiness, detection, and mitigation strategies.

Anthropic AI Red Teaming LLM Security

May 1, 2026

AI-Driven Vulnerability Discovery and Defensive Response

🤖 In the latest Adversary Universe podcast, CrowdStrike leaders discuss how AI is accelerating vulnerability discovery and could produce a rapid surge of new flaws — a potential 'vuln-pocalypse'. They urge prioritizing remediation based on active exploitation and prevalence in environments. CrowdStrike recommends leveraging AI for agentic red teaming, vulnerability scanning, and crowdsourced telemetry to detect post-exploitation behaviors. They point to Project Glasswing and OpenAI's Trusted Access for Cyber as examples of defense-focused collaboration.

CrowdStrike AI Red Teaming Vulnerability Management

April 30, 2026

Unit 42 Expands Frontier AI Defense with Armadin Partnership

🔒 Palo Alto Networks' Unit 42 is expanding its Frontier AI Defense service through a new partnership with Armadin, the offensive security firm founded by Kevin Mandia. The collaboration introduces an autonomous External AI Hyperattack Assessment that passively discovers internet-facing assets, then deploys a coordinated swarm of AI attack agents to validate exposures and exploit vulnerabilities in parallel. Unit 42 says this pressure-tested, decision-grade evidence accelerates remediation and helps organizations reduce AI-enabled external attack risk across cloud and perimeter environments.

Palo Alto Networks AI Red Teaming Exposure Management

April 29, 2026

AI Audit Finds 271 Vulnerabilities in Firefox 150 Release

🔍 The Firefox team used frontier AI models in partnership with Anthropic to scan the browser and fix latent security flaws. After earlier work with Opus 4.6 that produced 22 fixes for Firefox 148, an early evaluation of Claude Mythos Preview uncovered 271 vulnerabilities now addressed in Firefox 150. The team worked around the clock to triage and remediate the findings, and observers note this technology favors defenders—provided patches reach users quickly.

Anthropic Vulnerability Management AI Red Teaming

April 22, 2026

Anthropic Urges EPSS to Triage AI-Driven Vulnerabilities

🔍 Anthropic warns that its AI vulnerability-discovery system Mythos will sharply increase the pace and volume of software flaws, forcing defenders to prioritize what to fix. The company recommended using the probabilistic EPSS model (developed by Empirical Security and published through FIRST) to triage vulnerabilities—patching CISA’s KEV list first, then addressing CVEs above a chosen EPSS threshold. Empirical Security leaders emphasize that EPSS is machine-driven and already integrated across many vendor products.

Anthropic AI Red Teaming Vulnerability Management Advisory

April 20, 2026

Claude Mythos scrutiny: Project Glasswing's true impact

🔍 Anthropic's Claude Mythos — developed under Project Glasswing and currently trialed by select organizations — faces scrutiny after VulnCheck's analysis found limited publicly attributable results. The team identified 75 CVE entries mentioning Anthropic, 40 credited to its researchers, but only one explicitly tied to Glasswing (CVE-2026-4747), with several additional findings embargoed. Anthropic has signaled more transparency in July 2026. Security experts caution that Mythos' reported exploit success rates could still accelerate attacker capabilities and outpace corporate change controls.

Anthropic Claude AI Red Teaming Research

April 17, 2026

Commercial AI Models Make Rapid Gains in Vulnerability

🔍 Forescout’s Verde Labs reports rapid progress across commercial, open-source and underground AI models in vulnerability research and exploit generation. In 2026 the firm found all tested models could complete end-to-end vulnerability research and about half could autonomously produce working exploits; top performers included Claude Opus 4.6 and Kimi K2.5. Using single prompts, the RAPTOR agentic framework and Verde Labs’ extensions, researchers discovered four zero-days in OpenNDS, demonstrating a lower barrier to discovery and a growing risk for organizations.

Anthropic Claude AI Red Teaming Vulnerability Management

April 17, 2026

Mythos and the Limits of Private AI Security Control

🔍 Anthropic announced a restricted release of Claude Mythos Preview, an AI claimed to find and weaponize software vulnerabilities at unprecedented scale, and limited access to roughly 50 organizations under Project Glasswing. The company highlighted thousands of flaws across major operating systems and browsers, including decades-old bugs and a set of 181 usable Firefox attacks, far beyond its prior model's performance. Yet the disclosure omits key metrics—false-positive rates, unfiltered outputs, and broad audit access—raising concerns that withholding a powerful tool is not a substitute for transparency, independent review, and funded access for domain experts.

Anthropic Claude AI Red Teaming Vulnerability Management

April 16, 2026

ATHR: AI Voice Agents Enable Fully Automated Vishing

🔊 A new platform called ATHR automates telephone-oriented attacks by combining AI voice agents and optional human operators to carry out vishing campaigns and harvest credentials across services including Google, Microsoft, and major crypto platforms. Researchers at Abnormal say ATHR bundles email templates, spoofing, WebRTC/Asterisk routing, and per-target customization into a dashboard that controls distribution, calls, and logging. The service is marketed on underground forums for $4,000 plus a commission and greatly lowers the skill barrier for attackers.

Vishing AI Red Teaming Threat Actor

April 15, 2026

OpenAI Releases GPT-5.4-Cyber for Defensive Teams Now

🛡️ OpenAI has unveiled GPT-5.4-Cyber, a variant of its flagship GPT‑5.4 tuned for defensive cybersecurity use cases, and expanded its Trusted Access for Cyber (TAC) program to include thousands of authenticated individual defenders and hundreds of security teams. The company says the model is intended to help teams find, validate, and fix vulnerabilities faster while it iteratively strengthens safeguards to reduce dual‑use risks and resist jailbreaks and adversarial prompt injection. OpenAI highlighted its Codex Security agent, which it credits with contributing to the remediation of over 3,000 critical and high vulnerabilities, and framed the release as part of a broader shift toward continuous, developer‑integrated security feedback.

OpenAI AI Red Teaming Model Jailbreaks

April 14, 2026

AISI Urges Cybersecurity Basics After Mythos Test Guidance

🔐 The UK’s AI Security Institute (AISI) evaluated Anthropic’s Claude Mythos Preview and found it can autonomously discover and exploit vulnerabilities in controlled tests when given network access. In a 32‑step simulated corporate attack the model completed the full sequence in 3 of 10 runs and averaged 22 of 32 steps, though performance varied. AISI stresses these cyber ranges are easier than real environments and recommended organisations strengthen basics — timely patching, robust access controls, secure configuration and comprehensive logging — while also exploring AI to bolster defensive capabilities.

Anthropic AI Red Teaming Advisory

April 13, 2026

Anthropic's Mythos Spurs Structural Cybersecurity Shift

⚠️A new Cloud Security Alliance (CSA) briefing warns that Anthropic's Claude Mythos (Preview) marks a structural shift in cybersecurity. The model can autonomously discover and exploit thousands of vulnerabilities and orchestrate attacks at speeds that compress discovery-to-weaponization from weeks to hours. The paper — informed by leading security figures — says Mythos is not an outlier and urges CISOs to build Mythos-ready programs, harden fundamentals, and elevate the issue to the board.

Anthropic Claude AI Red Teaming Agentic AI

April 13, 2026

Anthropic’s Mythos Preview and Project Glasswing Risks

🔍 Anthropic's new Claude Mythos Preview and its Project Glasswing effort have focused industry attention on AI-driven cyberattack capabilities. Anthropic says it will not release the model publicly, citing the risk that it can automatically generate operational exploits, and is running the model against public and proprietary code to find and patch vulnerabilities before they can be weaponized. The announcement produced substantial PR impact, prompting rival vendors to echo similar caution. Security observers note defenders still hold an advantage—finding flaws is easier than turning them into attacks—but that margin is shrinking as models improve.

Anthropic Claude AI Red Teaming AI Safety

April 8, 2026

Anthropic unveils Project Glasswing to find critical bugs

🔍 Anthropic has launched Project Glasswing, an initiative that uses Claude Mythos Preview to autonomously locate and remediate undiscovered cybersecurity vulnerabilities in critical software. The private model — described by Anthropic as highly capable for coding and agentic tasks — was tested with launch partners including AWS, Google and Microsoft and reportedly found thousands of previously unidentified zero-day flaws. Anthropic committed up to $100m in usage credits and $4m in donations to support open-source security while keeping Mythos Preview restricted to defenders with guardrails.

Anthropic AI Red Teaming Vulnerability Disclosure

April 8, 2026

Anthropic's Claude Mythos Identifies Thousands of Zero‑Days

🔐 Anthropic launched Project Glasswing to apply a preview of its frontier model, Claude Mythos, to find and help remediate security vulnerabilities in critical software. The company says Mythos Preview has already identified thousands of high‑severity zero‑day flaws and autonomously developed complex exploits in testing. Access is restricted to a small set of vendors and foundations due to abuse risks. Anthropic committed significant usage credits and donations to support coordinated defensive patching while acknowledging prior operational leaks and the risk that the same capabilities could be misused.

Anthropic Claude Zero-Day Exploitation AI Red Teaming