< ciso
brief />
Tag Banner

All news with #ai guardrails tag

16 articles

Amazon Bedrock AgentCore adds Chrome policies and CA support

🔒 Amazon now enables Bedrock AgentCore to apply Chrome Enterprise policies to AgentCore Browser and to accept custom root Certificate Authority (CA) certificates for both AgentCore Browser and Code Interpreter. Administrators can leverage 100+ configurable browser policies — such as URL restrictions, disabling password managers, download controls, and kiosk-mode restrictions — to enforce compliance for AI agents. Custom root CA support permits secure TLS connections to internal services and corporate proxies that use enterprise-signed certificates, helping agents operate within strict security environments.
read more →

Securing Homegrown AI Agents with Falcon AIDR & NeMo

🔒 Falcon AIDR now integrates with NVIDIA NeMo Guardrails to provide programmable runtime protections for homegrown AI agents moving into production. The combined solution blocks prompt injection, redacts PII, defangs malicious domains, and moderates unwanted topics while preserving responsive, sub-100ms agent workflows. Teams can leverage 75+ built-in detectors or create custom policies to monitor in report-only mode and then progressively enforce blocks, redactions, encryptions, or transformations.
read more →

Researchers Find Major Security Flaws in LLM Guardrails

🔒 Researchers at Unit 42, Palo Alto Networks' lab, have demonstrated that LLM-based safety and evaluation systems — called AI Judges — can be manipulated via prompt-injection-style token sequences. Their custom fuzzer, AdvJudge-Zero, probes models in a black-box manner, finding low-perplexity formatting tokens that shift internal attention and increase the likelihood of an 'allow' decision. Unit 42 recorded a 99% bypass rate across multiple architectures, and showed that adversarial retraining on fuzzer-discovered examples can reduce that success rate to near zero.
read more →

AI Safety Measures Hamper Defenders More Than Attackers

🔒 Enterprise AI guardrails meant to prevent misuse are increasingly blocking legitimate defensive activity, creating an asymmetry that favors attackers. Widely deployed, enterprise-approved models often refuse realistic phishing simulations, exploit proofs-of-concept, or multi-step red-team scenarios once prompts resemble real-world attacks. Attackers evade these limits using jailbroken models, open-source deployments, fine-tuning, and underground toolkits. The article calls for authorization-based access, purpose-built security sandboxes, and vetting workflows so safety controls protect against misuse without crippling defenders.
read more →

Smashing Security Ep.450: Instagram leak and Grok fallout

🔍 Episode 450 explores confusion after claims that data linked to 17.5 million Instagram accounts was put up for sale — a story driven by a vague post, conflicting statements, and an unexpected flood of password‑reset emails. The episode also examines Grok, Elon Musk’s AI chatbot, after it generated sexualised images of women and children, raising urgent questions about guardrails and accountability. Hosts discuss why simple censorship is not a solution.
read more →

The Dual Role of AI in Empowering and Threatening Security

🛡️ AI and large language models are transforming cybersecurity into a contest of speed and scale, serving as both best-in-class defensive tools and powerful offensive enablers. Researchers describe self-modifying malware and autonomous espionage that call commercial LLMs (e.g., PROMPTFLUX, PROMPTSTEAL) to adapt tactics mid-execution, while defenders are deploying solutions like XBOW, CodeMender and Watsonx to automate vulnerability discovery, remediation and compliance. CISOs must therefore pair AI-driven defenses with governance and model guardrails to manage this dual-use reality.
read more →

Securing Vibe Coding: Governance for AI Development

🛡️ Vibe coding accelerates development but often omits essential security controls, introducing vulnerabilities, data exfiltration, and destructive actions. Unit 42 documents incidents where AI-generated code bypassed authentication, executed arbitrary commands, deleted production databases, or exposed sensitive identifiers. To mitigate these risks, Unit 42 proposes the SHIELD framework—Separation, Human review, Input/output validation, Enforcer helper models, Least agency, and Defensive controls. Implementing these measures restores governance and enables safer AI-assisted development.
read more →

Human-in-the-Loop Safeguards Can Be Forged, Researchers Warn

⚠️ Checkmarx research shows Human-in-the-Loop (HITL) confirmation dialogs can be manipulated so attackers embed malicious instructions into prompts, a technique the researchers call Lies-in-the-Loop (LITL). Attackers can hide or misrepresent dangerous commands by padding payloads, exploiting rendering behaviors like Markdown, or pushing harmful text out of view. Approval dialogs meant as a final safety backstop can thus become an attack surface. Checkmarx urges developers to constrain dialog rendering and validate approved operations; vendors acknowledged the report but did not classify it as a vulnerability.
read more →

Google deploys second model to guard Gemini Chrome agent

🛡️ Google has added a separate user alignment critic to its Gemini-powered Chrome browsing agent to vet and block proposed actions that do not match user intent. The critic is isolated from web content and sees only metadata about planned actions, providing feedback to the primary planning model when it rejects a step. Google also enforces origin sets to limit where the agent can read or act, requires confirmations for banking, medical, password use and purchases, and runs a classifier plus automated red‑teaming to detect prompt injection attempts during preview.
read more →

Amazon Nova adds customizable content moderation settings

🔒 Amazon announced that Amazon Nova models now support customizable content moderation settings for approved business use cases that require processing or generating sensitive content. Organizations can adjust controls across four domains—safety, sensitive content, fairness, and security—while Amazon enforces essential, non-configurable safeguards to protect children and preserve privacy. Customization is available for Amazon Nova Lite and Amazon Nova Pro in the US East (N. Virginia) region; customers should contact their AWS Account Manager to confirm eligibility.
read more →

Blueprint for Building Safe and Secure AI Agents at Scale

🔒 Azure outlines a layered blueprint for building trustworthy, enterprise-grade AI agents. The post emphasizes identity, data protection, built-in controls, continuous evaluation, and monitoring to address risks like data leakage, prompt injection, and agent sprawl. Azure AI Foundry introduces Entra Agent ID, cross-prompt injection classifiers, risk and safety evaluations, and integrations with Microsoft Purview and Defender. Join Microsoft Secure on September 30 to learn about Foundry's newest capabilities.
read more →

Deploying Agentic AI: Five Steps for Red-Teaming Guide

🛡️ Enterprises adopting agentic AI must update red‑teaming practices to address a rapidly expanding and interactive attack surface. The article summarizes the Cloud Security Alliance’s Agentic AI Red Teaming Guide and corroborating research that documents prompt injection, multi‑agent manipulation, and authorization hijacking as practical threats. It recommends five pragmatic steps—change attitude, continually test guardrails and governance, broaden red‑team skill sets, widen the solution space, and adopt modern tooling—and highlights open‑source and commercial tools such as AgentDojo and Agentgateway. The overall message: combine automated agents with human creativity, embed security in design, and treat agentic systems as sociotechnical operators rather than simple software.
read more →

Prompt Injection via Macros Emerges as New AI Threat

🛡️ Enterprises now face attackers embedding malicious prompts in document macros and hidden metadata to manipulate generative AI systems that parse files. Researchers and vendors have identified exploits — including EchoLeak and CurXecute — and a June 2025 Skynet proof-of-concept that target AI-powered parsers and malware scanners. Experts urge layered defenses such as deep file inspection, content disarm and reconstruction (CDR), sandboxing, input sanitization, and strict model guardrails to prevent AI-driven misclassification or data exposure.
read more →

Cloudy-driven Email Detection Summaries and Guardrails

🛡️Cloudflare extended its AI agent Cloudy to generate clear, concise explanations for email security detections so SOC teams can understand why messages are blocked. Early LLM implementations produced dangerous hallucinations when asked to interpret complex, multi-model signals, so Cloudflare implemented a Retrieval-Augmented Generation approach and enriched contextual prompts to ground outputs. Testing shows these guardrails yield more reliable summaries, and a controlled beta will validate performance before wider rollout.
read more →

Portkey Integrates Prisma AIRS to Secure AI Gateways

🔐 Palo Alto Networks and Portkey have integrated Prisma AIRS directly into Portkey’s AI gateway to embed security guardrails at the gateway level. The collaboration aims to protect applications from AI-specific threats—such as prompt injections, PII and secret leakage, and malicious outputs—while preserving Portkey’s operational benefits like observability and cost controls. A one-time configuration via Portkey’s Guardrails module enforces protections without code changes, and teams can monitor posture through Portkey logs and the Prisma AIRS dashboard.
read more →

Defending Against Indirect Prompt Injection in LLMs

🔒 Microsoft outlines a layered defense-in-depth strategy to protect systems using LLMs from indirect prompt injection attacks. The approach pairs preventative controls such as hardened system prompts and Spotlighting (delimiting, datamarking, encoding) to isolate untrusted inputs with detection via Microsoft Prompt Shields, surfaced through Azure AI Content Safety and integrated with Defender for Cloud. Impact mitigation uses deterministic controls — fine-grained permissions, Microsoft Purview sensitivity labels, DLP policies, explicit user consent workflows, and blocking known exfiltration techniques — while ongoing research (TaskTracker, LLMail-Inject, FIDES) advances new design patterns and assurances.
read more →