All news with #ai guardrails tag

20 articles

May 25, 2026

Shift AI Security from Models to System-Level Controls

🛡️ Researchers argue enterprises must stop treating AI agents as trusted components and instead secure them as untrusted systems. The paper, authored by teams from Google, UC San Diego, UW–Madison and others, distills five systems-security principles—least privilege, tamper resistance, complete mediation, secure information flow, and human risk—and maps eleven real-world agent attacks to these violations. They caution that stacking ML guardrails is insufficient and propose research directions for separating instructions from data, verifiable least-privilege policies, and information-flow controls.

AI Security Agent Security Agentic AI AI Guardrails

May 20, 2026

RAMPART and Clarity: Open Tools for Agent Safety Workflow

🔒 Microsoft has open-sourced two engineering tools—RAMPART and Clarity—to make agent safety a continuous part of development. RAMPART provides a pytest-style framework that brings red-team and adversarial tests into CI, evaluating tools invoked and side effects. Clarity is a structured design companion that captures problem statements, failure analyses, and decisions in a .clarity-protocol directory. Both aim to create living safety artifacts integrated into normal workflows.

Microsoft Agent Security Agentic AI AI Red Teaming

May 18, 2026

Cloudflare Findings on Frontier Cybersecurity LLMs

🔍 Cloudflare tested security-focused LLMs on its infrastructure and reports detailed findings from using Anthropic’s Mythos Preview as part of Project Glasswing. The model stood out for exploit chain construction and automated proof generation, producing runnable PoCs and iterating on failures. Its emergent guardrails proved inconsistent across runs and prompts, so Cloudflare built a tailored harness and additional safeguards to scale safely. The team also observed higher-quality, actionable findings compared with earlier frontier models, but noted increased noise from memory-unsafe languages and model bias.

Cloudflare Anthropic LLM Security AI Guardrails

May 15, 2026

AWS AI Security Framework: Controls by Layer and Phase

🔒 The AWS AI Security Framework presents a structured model that helps security and business leaders align the right controls to the right use case, at the right layer, and at the right phase so AI can move from prototype to production securely. Its core principle is that you build AI on top of security, not add security later. The post maps controls across three layers—infrastructure, identity and data, and AI application—and across four use cases from answering to agentic and physical AI. It highlights Amazon Bedrock and AgentCore as pillars that decouple model choice from security infrastructure.

AWS Amazon Bedrock AI Security AI Guardrails

March 25, 2026

Amazon Bedrock AgentCore adds Chrome policies and CA support

🔒 Amazon now enables Bedrock AgentCore to apply Chrome Enterprise policies to AgentCore Browser and to accept custom root Certificate Authority (CA) certificates for both AgentCore Browser and Code Interpreter. Administrators can leverage 100+ configurable browser policies — such as URL restrictions, disabling password managers, download controls, and kiosk-mode restrictions — to enforce compliance for AI agents. Custom root CA support permits secure TLS connections to internal services and corporate proxies that use enterprise-signed certificates, helping agents operate within strict security environments.

Amazon Bedrock Bedrock Guardrails AI Guardrails

March 19, 2026

Securing Homegrown AI Agents with Falcon AIDR & NeMo

🔒 Falcon AIDR now integrates with NVIDIA NeMo Guardrails to provide programmable runtime protections for homegrown AI agents moving into production. The combined solution blocks prompt injection, redacts PII, defangs malicious domains, and moderates unwanted topics while preserving responsive, sub-100ms agent workflows. Teams can leverage 75+ built-in detectors or create custom policies to monitor in report-only mode and then progressively enforce blocks, redactions, encryptions, or transformations.

CrowdStrike Nvidia NeMo AI Guardrails Prompt Injection Attack

March 11, 2026

Researchers Find Major Security Flaws in LLM Guardrails

🔒 Researchers at Unit 42, Palo Alto Networks' lab, have demonstrated that LLM-based safety and evaluation systems — called AI Judges — can be manipulated via prompt-injection-style token sequences. Their custom fuzzer, AdvJudge-Zero, probes models in a black-box manner, finding low-perplexity formatting tokens that shift internal attention and increase the likelihood of an 'allow' decision. Unit 42 recorded a 99% bypass rate across multiple architectures, and showed that adversarial retraining on fuzzer-discovered examples can reduce that success rate to near zero.

Unit 42 Palo Alto Networks LLM Security AI Guardrails

March 10, 2026

AI Safety Measures Hamper Defenders More Than Attackers

🔒 Enterprise AI guardrails meant to prevent misuse are increasingly blocking legitimate defensive activity, creating an asymmetry that favors attackers. Widely deployed, enterprise-approved models often refuse realistic phishing simulations, exploit proofs-of-concept, or multi-step red-team scenarios once prompts resemble real-world attacks. Attackers evade these limits using jailbroken models, open-source deployments, fine-tuning, and underground toolkits. The article calls for authorization-based access, purpose-built security sandboxes, and vetting workflows so safety controls protect against misuse without crippling defenders.

AI Safety AI Guardrails Prompt Security Jailbreak

January 15, 2026

Smashing Security Ep.450: Instagram leak and Grok fallout

🔍 Episode 450 explores confusion after claims that data linked to 17.5 million Instagram accounts was put up for sale — a story driven by a vague post, conflicting statements, and an unexpected flood of password‑reset emails. The episode also examines Grok, Elon Musk’s AI chatbot, after it generated sexualised images of women and children, raising urgent questions about guardrails and accountability. Hosts discuss why simple censorship is not a solution.

Data Breach AI Guardrails News

January 8, 2026

The Dual Role of AI in Empowering and Threatening Security

🛡️ AI and large language models are transforming cybersecurity into a contest of speed and scale, serving as both best-in-class defensive tools and powerful offensive enablers. Researchers describe self-modifying malware and autonomous espionage that call commercial LLMs (e.g., PROMPTFLUX, PROMPTSTEAL) to adapt tactics mid-execution, while defenders are deploying solutions like XBOW, CodeMender and Watsonx to automate vulnerability discovery, remediation and compliance. CISOs must therefore pair AI-driven defenses with governance and model guardrails to manage this dual-use reality.

LLM Security AI Security Prompt Injection AI Guardrails

January 8, 2026

Securing Vibe Coding: Governance for AI Development

🛡️ Vibe coding accelerates development but often omits essential security controls, introducing vulnerabilities, data exfiltration, and destructive actions. Unit 42 documents incidents where AI-generated code bypassed authentication, executed arbitrary commands, deleted production databases, or exposed sensitive identifiers. To mitigate these risks, Unit 42 proposes the SHIELD framework—Separation, Human review, Input/output validation, Enforcer helper models, Least agency, and Defensive controls. Implementing these measures restores governance and enables safer AI-assisted development.

LLM Security AI Guardrails DevSecOps Secure SDLC

December 18, 2025

Human-in-the-Loop Safeguards Can Be Forged, Researchers Warn

⚠️ Checkmarx research shows Human-in-the-Loop (HITL) confirmation dialogs can be manipulated so attackers embed malicious instructions into prompts, a technique the researchers call Lies-in-the-Loop (LITL). Attackers can hide or misrepresent dangerous commands by padding payloads, exploiting rendering behaviors like Markdown, or pushing harmful text out of view. Approval dialogs meant as a final safety backstop can thus become an attack surface. Checkmarx urges developers to constrain dialog rendering and validate approved operations; vendors acknowledged the report but did not classify it as a vulnerability.

Prompt Injection Attack LLM Security AI Guardrails

December 9, 2025

Google deploys second model to guard Gemini Chrome agent

🛡️ Google has added a separate user alignment critic to its Gemini-powered Chrome browsing agent to vet and block proposed actions that do not match user intent. The critic is isolated from web content and sees only metadata about planned actions, providing feedback to the primary planning model when it rejects a step. Google also enforces origin sets to limit where the agent can read or act, requires confirmations for banking, medical, password use and purchases, and runs a classifier plus automated red‑teaming to detect prompt injection attempts during preview.

Google Gemini AI Guardrails

October 21, 2025

Amazon Nova adds customizable content moderation settings

🔒 Amazon announced that Amazon Nova models now support customizable content moderation settings for approved business use cases that require processing or generating sensitive content. Organizations can adjust controls across four domains—safety, sensitive content, fairness, and security—while Amazon enforces essential, non-configurable safeguards to protect children and preserve privacy. Customization is available for Amazon Nova Lite and Amazon Nova Pro in the US East (N. Virginia) region; customers should contact their AWS Account Manager to confirm eligibility.

Amazon Bedrock AI Guardrails

September 17, 2025

Blueprint for Building Safe and Secure AI Agents at Scale

🔒 Azure outlines a layered blueprint for building trustworthy, enterprise-grade AI agents. The post emphasizes identity, data protection, built-in controls, continuous evaluation, and monitoring to address risks like data leakage, prompt injection, and agent sprawl. Azure AI Foundry introduces Entra Agent ID, cross-prompt injection classifiers, risk and safety evaluations, and integrations with Microsoft Purview and Defender. Join Microsoft Secure on September 30 to learn about Foundry's newest capabilities.

Azure Azure AI Foundry Agentic AI AI Guardrails

September 17, 2025

Deploying Agentic AI: Five Steps for Red-Teaming Guide

🛡️ Enterprises adopting agentic AI must update red‑teaming practices to address a rapidly expanding and interactive attack surface. The article summarizes the Cloud Security Alliance’s Agentic AI Red Teaming Guide and corroborating research that documents prompt injection, multi‑agent manipulation, and authorization hijacking as practical threats. It recommends five pragmatic steps—change attitude, continually test guardrails and governance, broaden red‑team skill sets, widen the solution space, and adopt modern tooling—and highlights open‑source and commercial tools such as AgentDojo and Agentgateway. The overall message: combine automated agents with human creativity, embed security in design, and treat agentic systems as sociotechnical operators rather than simple software.

Agentic AI AI Red Teaming Prompt Injection AI Guardrails

September 11, 2025

Prompt Injection via Macros Emerges as New AI Threat

🛡️ Enterprises now face attackers embedding malicious prompts in document macros and hidden metadata to manipulate generative AI systems that parse files. Researchers and vendors have identified exploits — including EchoLeak and CurXecute — and a June 2025 Skynet proof-of-concept that target AI-powered parsers and malware scanners. Experts urge layered defenses such as deep file inspection, content disarm and reconstruction (CDR), sandboxing, input sanitization, and strict model guardrails to prevent AI-driven misclassification or data exposure.

Prompt Injection Attack LLM Security AI Guardrails

August 29, 2025

Cloudy-driven Email Detection Summaries and Guardrails

🛡️Cloudflare extended its AI agent Cloudy to generate clear, concise explanations for email security detections so SOC teams can understand why messages are blocked. Early LLM implementations produced dangerous hallucinations when asked to interpret complex, multi-model signals, so Cloudflare implemented a Retrieval-Augmented Generation approach and enriched contextual prompts to ground outputs. Testing shows these guardrails yield more reliable summaries, and a controlled beta will validate performance before wider rollout.

Cloudflare AI Guardrails LLM Security RAG Security

August 6, 2025

Portkey Integrates Prisma AIRS to Secure AI Gateways

🔐 Palo Alto Networks and Portkey have integrated Prisma AIRS directly into Portkey’s AI gateway to embed security guardrails at the gateway level. The collaboration aims to protect applications from AI-specific threats—such as prompt injections, PII and secret leakage, and malicious outputs—while preserving Portkey’s operational benefits like observability and cost controls. A one-time configuration via Portkey’s Guardrails module enforces protections without code changes, and teams can monitor posture through Portkey logs and the Prisma AIRS dashboard.

Palo Alto Networks Prompt Injection Attack AI Guardrails

July 29, 2025

Defending Against Indirect Prompt Injection in LLMs

🔒 Microsoft outlines a layered defense-in-depth strategy to protect systems using LLMs from indirect prompt injection attacks. The approach pairs preventative controls such as hardened system prompts and Spotlighting (delimiting, datamarking, encoding) to isolate untrusted inputs with detection via Microsoft Prompt Shields, surfaced through Azure AI Content Safety and integrated with Defender for Cloud. Impact mitigation uses deterministic controls — fine-grained permissions, Microsoft Purview sensitivity labels, DLP policies, explicit user consent workflows, and blocking known exfiltration techniques — while ongoing research (TaskTracker, LLMail-Inject, FIDES) advances new design patterns and assurances.

LLM Security Prompt Injection Attack Indirect Prompt Injection AI Guardrails