All news with #ai red teaming tag

88 articles · page 4 of 5

December 12, 2025

OpenAI Expands Defense-in-Depth to Curb Model Abuse

🛡️ OpenAI says it is expanding a "defense in depth" strategy to limit misuse of its frontier AI models, warning they could be used to develop zero-day exploits or aid complex intrusion operations. The company announced a new Frontier Risk Council, broader guardrails, external red‑teaming, and a trusted access program for vetted customers testing defensive use cases. OpenAI also plans to scale its Aardvark Agentic Security Researcher beta to scan codebases and recommend mitigations.

OpenAI AI Safety AI Red Teaming

December 11, 2025

AI Agents Demonstrate Real-World Smart Contract Exploits

🔍 Researchers used a new benchmark, SCONE-bench, to train AI agents to find and produce exploits against historically compromised smart contracts. On 405 real-world contracts from 2020–2025, Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 generated exploits valued at $4.6 million. In simulated tests against 2,849 recently deployed contracts the agents discovered two novel zero-day vulnerabilities and created exploits worth $3,694, with GPT-5 incurring $3,476 in API costs. The findings show autonomous, profitable exploitation is technically feasible and emphasize the need for proactive AI-driven defense.

Claude AI Red Teaming

December 4, 2025

Generative AI's Dual Role in Cybersecurity, Evolving

🛡️ Generative AI is rapidly reshaping cybersecurity by amplifying both attackers' and defenders' capabilities. Adversaries leverage models for coding assistance, phishing and social engineering, anti-analysis techniques (including prompts hidden in DNS) and vulnerability discovery, with AI-assisted elements beginning to appear in malware while still needing significant human oversight. Defenders use GenAI to triage threat data, speed incident response, detect code flaws, and augment analysts through MCP-style integrations. As models shrink and access widens, both risk and defensive opportunity are likely to grow.

LLM Security AI Red Teaming Threat Report

December 4, 2025

How Companies Can Prepare for Emerging AI Security Threats

🔒 Generative AI introduces new attack surfaces that alter trust relationships between users, applications and models. Siemens' pentest and security teams differentiate Offensive Security (targeted technical pentests) from Red Teaming (broader organizational simulations of real attackers). Traditional ML risks such as image or biometric misclassification remain relevant, but experts now single out prompt injection as the most serious threat — simple crafted inputs can leak system prompts, cause misinformation, or convert innocuous instructions into dangerous command injections.

LLM Security AI Red Teaming Prompt Injection Attack How-To

November 27, 2025

LLMs Can Produce Malware Code but Reliability Lags

🔬 Netskope Threat Labs tested whether large language models can generate operational malware by asking GPT-3.5-Turbo, GPT-4 and GPT-5 to produce Python for process injection, AV/EDR termination and virtualization detection. GPT-3.5-Turbo produced malicious code quickly, while GPT-4 initially refused but could be coaxed with role-based prompts. Generated scripts ran reliably on physical hosts, had moderate success in VMware, and performed poorly in AWS Workspaces VDI; GPT-5 raised success rates substantially but also returned safer alternatives because of stronger safeguards. Researchers conclude LLMs can create useful attack code but still struggle with reliable evasion and cloud adaptation, so full automation of malware remains infeasible today.

LLM Security Malware AI Red Teaming

November 19, 2025

Using AI to Avoid Black Friday Price Manipulation and Scams

🛍️ Black Friday shopping is increasingly fraught with staged discounts and manipulated prices, but large language models (LLMs) can help shoppers cut through the noise. Use AI like ChatGPT, Claude, or Gemini to build a wish list, track historical prices, compare alternatives, and vet sellers quickly. The article provides step-by-step prompts for price analysis, seller verification, local-market queries, and model-specific requests, and recommends security measures such as using a separate card and installing Kaspersky Premium to reduce fraud risk.

ChatGPT Claude Gemini AI Red Teaming

November 18, 2025

Researchers Detail Tuoni C2's Role in Real-Estate Attack

🔒 Cybersecurity researchers disclosed an attempted intrusion against a major U.S. real-estate firm that leveraged the emerging Tuoni C2 and red-team framework. The campaign, observed in mid-October 2025, used Microsoft Teams impersonation and a PowerShell loader that fetched a BMP-steganographed payload from kupaoquan[.]com and executed shellcode in memory. That sequence spawned TuoniAgent.dll, which contacted a C2 server but ultimately failed to achieve its goals. The incident highlights the risk of freely available red-team tooling and AI-assisted code generation being abused by threat actors.

Cobalt Strike Remote Access Trojan Defense Evasion AI Red Teaming

November 17, 2025

A Methodical Approach to Agent Evaluation: Quality Gate

🧭 Hugo Selbie presents a practical framework for evaluating modern multi-step AI agents, emphasizing that final-output metrics alone miss silent failures arising from incorrect reasoning or tool use. He recommends defining clear, measurable success criteria up front and assessing agents across three pillars: end-to-end quality, process/trajectory analysis, and trust & safety. The piece outlines mixed evaluation methods—human review, LLM-as-a-judge, programmatic checks, and adversarial testing—and prescribes operationalizing these checks in CI/CD with production monitoring and feedback loops.

Agent Security AI Red Teaming Model Evaluation

November 14, 2025

Adversarial AI Bots vs Autonomous Threat Hunters Outlook

🤖 AI-driven adversarial bots are rapidly amplifying attackers' capabilities, enabling autonomous pen testing and large-scale credential abuse that many organizations aren't prepared to detect or remediate. Tools like XBOW and Hexstrike-AI demonstrate how agentic systems can discover zero-days and coordinate complex operations at scale. Defenders must adopt continuous, context-rich approaches such as digital twins for real-time threat modeling rather than relying on incremental automation.

Agentic AI AI Red Teaming Threat Hunting

November 6, 2025

Multi-Turn Adversarial Attacks Expose LLM Weaknesses

🔍 Cisco AI Defense's report shows open-weight large language models remain vulnerable to adaptive, multi-turn adversarial attacks even when single-turn defenses appear effective. Using over 1,000 prompts per model and analyzing 499 simulated conversations of 5–10 exchanges, researchers found iterative strategies such as Crescendo, Role-Play and Refusal Reframe drove failure rates above 90% in many cases. The study warns that traditional safety filters are insufficient and recommends strict system prompts, model-agnostic runtime guardrails and continuous red-teaming to mitigate risk.

LLM Security AI Red Teaming Jailbreak

November 5, 2025

Addressing the AI Black Box with Prisma AIRS 2.0 Platform

🔒 Prisma AIRS 2.0 presents a unified AI security platform that addresses the “AI black box” by combining AI Model Security and automated AI Red Teaming. It inventories models, inference datasets, applications and agents in real time, inspects model artifacts within CI/CD and model registries, and conducts continuous, context-aware adversarial testing. The platform integrates curated threat intelligence and governance mappings to deliver auditable risk scores and prioritized remediation guidance for enterprise teams.

Palo Alto Networks AI Red Teaming Model Security AI Governance

November 3, 2025

OpenAI Aardvark: Autonomous GPT-5 Agent for Code Security

🛡️ OpenAI Aardvark is an autonomous GPT-5-based agent that scans, analyzes and patches code by emulating a human security researcher. Rather than only flagging suspicious patterns, it maps repositories, builds contextual threat models, validates findings in sandboxes and proposes fixes via Codex, then rechecks changes to prevent regressions. OpenAI reports it found 92% of benchmark vulnerabilities and has already identified real issues in open-source projects, offering free coordinated scanning for selected non-commercial repositories.

OpenAI SAST AI Red Teaming Product Launch

October 30, 2025

AI-Designed Bioweapons: The Detection vs Creation Arms Race

🧬 Researchers used open-source AI to design variants of ricin and other toxic proteins, then converted those designs into DNA sequences and submitted them to commercial DNA-order screening tools. From 72 toxins and three AI packages they generated roughly 75,000 designs and found wide variation in how four screening programs flagged potential threats. Three of the packages were patched and improved after the test, but many AI-designed variants—often likely non-functional because of misfolding—exposed gaps in detection. The authors warn this imbalance could produce an arms race where design outpaces reliable screening.

AI Red Teaming Synthetic Media Risk AI Safety

October 29, 2025

Open-Source b3 Benchmark Boosts LLM Security Testing

🛡️ The UK AI Security Institute (AISI), Check Point and Lakera have launched b3, an open-source benchmark to assess and strengthen the security of backbone LLMs that power AI agents. b3 focuses on the specific LLM calls within agent workflows where malicious inputs can trigger harmful outputs, using 10 representative "threat snapshots" combined with a dataset of 19,433 adversarial attacks from Lakera’s Gandalf initiative. The benchmark surfaces vulnerabilities such as system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service and unauthorized tool calls, making LLM security more measurable, reproducible and comparable across models and applications.

Check Point LLM Security AI Red Teaming Prompt Injection Attack

October 21, 2025

Google Migrates ISAs with AI and Automation at Scale

🔧 Google details how its custom Axion Arm CPUs and a mix of automation and AI enabled large-scale migration from x86 to multi-architecture production across services such as YouTube, Gmail, and BigQuery. The team analyzed 38,156 commits (about 700K changed lines) and reports migrating more than 30,000 applications to Arm while keeping both Arm and x86 in production. Existing automation like Rosie, sanitizers, fuzzers, and the CHAMP rollout framework handled much of the work, while an LLM-driven agent called CogniPort fixed build and test failures, showing a 30% success rate on a 245-commit benchmark. Google plans to default new apps to multiarch and continue refining AI tools to address the remaining long tail.

Google Google Cloud AI Red Teaming

October 15, 2025

Google Named a Leader in the 2025 Gartner SIEM Magic Quadrant

🔒 Google Security Operations has been named a Leader in the 2025 Gartner Magic Quadrant for SIEM, recognized for both Ability to Execute and highest Completeness of Vision. The AI-driven platform leverages Gemini to automate data analysis, assist investigations with natural language, and orchestrate responses, combining curated detections, SOAR, and case-centric workflows. Customers report measurable outcomes — up to 240% ROI over three years, 50% faster MTTR, and 65% faster MTTI — driven by automation and an emerging agentic SOC vision.

Google Security Operations SIEM Gemini AI Red Teaming

October 2, 2025

Daniel Miessler on AI Attack-Defense Balance and Context

🔍 Daniel Miessler argues that context determines the AI attack–defense balance: whoever holds the most accurate, actionable picture of a target gains the edge. He forecasts attackers will have the advantage for roughly 3–5 years as Red teams leverage public OSINT and reconnaissance while LLMs and SPQA-style architectures mature. Once models can ingest reliable internal company context at scale, defenders should regain the upper hand by prioritizing fixes and applying mitigations faster.

AI Security LLM Security AI Red Teaming

September 29, 2025

Microsoft Blocks Phishing Using AI-Generated Code Tactics

🔒 Microsoft Threat Intelligence stopped a credential phishing campaign that likely used AI-generated code to hide a payload inside an SVG file disguised as a PDF. Attackers sent self-addressed emails from a compromised small-business account, hiding real targets in the Bcc field and attaching a file named "23mb – PDF- 6 pages.svg." Embedded JavaScript decoded business-style obfuscation to redirect victims to a fake CAPTCHA and a fraudulent sign-in page, and Microsoft Defender for Office 365 blocked the campaign by flagging delivery patterns, suspicious domains and anomalous code behavior.

Phishing Defender for Office 365 AI Red Teaming

September 29, 2025

Can AI Reliably Write Vulnerability Detection Checks?

🔍 Intruder’s security team tested whether large language models can write Nuclei vulnerability templates and found one-shot LLM prompts often produced invalid or weak checks. Using an agentic approach with Cursor—indexing a curated repo and applying rules—yielded outputs much closer to engineer-written templates. The current workflow uses standard prompts and rules so engineers can focus on validation and deeper research while AI handles repetitive tasks.

AI Red Teaming Vulnerability Management Detection Rule

September 24, 2025

Ransomware Speed Crisis: Defending at Machine Pace

⚠️ Ransomware attacks have accelerated to machine speed, often completing exfiltration and impact in minutes rather than days. Unit 42 research documents a dramatic decline in mean time to exfiltrate, driven by AI automation, initial access brokers and RaaS, which together enable highly targeted, fast-moving campaigns. Organizations now need AI-powered detection, automated containment and unified XDR visibility across endpoints, network and cloud to stop threats in real time. Human analysts remain vital but must operate alongside automated systems to focus on hunting and strategic response.

Ransomware XDR AI Red Teaming