Tag Banner

All news with #safety guardrails tag

Wed, December 10, 2025

Building a security-first culture for agentic AI enterprises

🔒 Microsoft argues that as organizations adopt agentic AI, security must be a strategic priority that enables growth, trust, and continued innovation. The post identifies risks such as oversharing, data leakage, compliance gaps, and agent sprawl, and recommends three pillars: prepare for AI and agent integration, strengthen organization-wide skilling, and foster a security-first culture. It points to resources like Microsoft’s AI adoption model, Microsoft Learn, and the AI Skills Navigator to help operationalize these steps.

read more →

Tue, December 9, 2025

Google deploys second model to guard Gemini Chrome agent

🛡️ Google has added a separate user alignment critic to its Gemini-powered Chrome browsing agent to vet and block proposed actions that do not match user intent. The critic is isolated from web content and sees only metadata about planned actions, providing feedback to the primary planning model when it rejects a step. Google also enforces origin sets to limit where the agent can read or act, requires confirmations for banking, medical, password use and purchases, and runs a classifier plus automated red‑teaming to detect prompt injection attempts during preview.

read more →

Tue, December 9, 2025

AI vs Human Drivers — Safety, Trials, and Policy Debate

🚗 Bruce Schneier frames a public-policy dilemma: a neurosurgeon writing in the New York Times calls driverless cars a “public health breakthrough,” citing more than 39,000 US traffic fatalities and thousands of daily crash victims, while the authors of Driving Intelligence: The Green Book argue that ongoing autonomous-vehicle (AV) trials have produced deaths and should be halted and forensically reviewed. Schneier cites a 2016 paper, Driving to safety, which shows that proving AV safety by miles-driven alone would require hundreds of millions to billions of miles, making direct statistical comparison impractical. The paper argues regulators and developers must adopt alternative evidence methods and adaptive regulation because uncertainty about AV safety will persist.

read more →

Mon, December 8, 2025

AWS unveils AI-driven security enhancements at re:Invent

🔒 AWS announced a suite of AI- and automation-driven security features at re:Invent 2025 designed to shift cloud protection from reactive response to proactive prevention. AWS Security Agent and agentic incident response add continuous code review and automated investigations, while ML enhancements in GuardDuty and near real-time analytics in Security Hub improve multi-stage threat detection. Agent-centric IAM tools, including policy autopilot and private sign-in routes, streamline permissions and enforce granular, zero-trust access for agents and workloads.

read more →

Thu, December 4, 2025

NSA Warns AI Introduces New Risks to OT Networks, Allies

⚠️ The NSA, together with the Australian Signals Directorate and allied security agencies, published the Principles for the Secure Integration of Artificial Intelligence in Operational Technology to highlight emerging risks as AI is applied to safety-critical OT networks. The guidance flags adversarial prompt injection, data poisoning, AI drift, hallucinations, loss of explainability, human de-skilling and alert fatigue as primary concerns. It urges operators to adopt CISA secure design practices, maintain accurate asset inventories, consider in-house development tradeoffs, and apply rigorous oversight before deploying AI in OT environments.

read more →

Sun, November 30, 2025

Amazon Connect Adds AI-Powered Case Summaries for Agents

🤖 Amazon Connect now offers AI-powered case summaries that let agents generate concise, multi-interaction case overviews with a single click. Summaries capture issue background, actions taken, follow-ups, and recommended next steps to reduce manual wrap-up and speed resolutions. Administrators can configure custom prompts and guardrails to enforce organizational style and compliance.

read more →

Mon, November 24, 2025

DeepSeek-R1 Generates Less Secure Code for China-Sensitive Prompts

⚠️ CrowdStrike analysis finds that DeepSeek-R1, an open-source AI reasoning model from a Chinese vendor, produces significantly more insecure code when prompts reference topics the Chinese government deems sensitive. Baseline tests produced vulnerable code in 19% of neutral prompts, rising to 27.2% for Tibet-linked scenarios. Researchers also observed partial refusals and internal planning traces consistent with targeted guardrails that may unintentionally degrade code quality.

read more →

Fri, November 21, 2025

Bedrock Guardrails: Natural-Language Test Generation

🧪 Amazon Web Services has added natural-language test Q&A generation to Automated Reasoning checks in Amazon Bedrock Guardrails. The capability generates up to N test Q&As from input documents to accelerate creating and validating formal verification policies. Automated Reasoning checks apply formal methods to detect correct model outputs and report up to 99% accuracy in identifying correct responses and reducing hallucinations. The feature is available in multiple US and EU Regions and accessible via the Bedrock console and Python SDK.

read more →

Thu, November 20, 2025

CrowdStrike: Political Triggers Reduce AI Code Security

🔍 DeepSeek-R1, a 671B-parameter open-source LLM, produced code with significantly more severe security vulnerabilities when prompts included politically sensitive modifiers. CrowdStrike found baseline vulnerable outputs at 19%, rising to 27.2% or higher for certain triggers and recurring severe flaws such as hard-coded secrets and missing authentication. The model also refused requests related to Falun Gong in 45% of cases, exhibiting an intrinsic "kill switch" behavior. The report urges thorough, environment-specific testing of AI coding assistants rather than reliance on generic benchmarks.

read more →

Wed, November 19, 2025

Amazon Bedrock Guardrails Expand Code-Related Protections

🔒 Amazon Web Services expanded Amazon Bedrock Guardrails to cover code-related use cases, enabling detection and prevention of harmful content embedded in code. The update applies content filters, denied topics, and sensitive information filters to code elements such as comments, variable and function names, and string literals. The enhancements also include prompt leakage detection in the standard tier and are available in all supported AWS Regions via the console and APIs.

read more →

Mon, November 17, 2025

A Methodical Approach to Agent Evaluation: Quality Gate

🧭 Hugo Selbie presents a practical framework for evaluating modern multi-step AI agents, emphasizing that final-output metrics alone miss silent failures arising from incorrect reasoning or tool use. He recommends defining clear, measurable success criteria up front and assessing agents across three pillars: end-to-end quality, process/trajectory analysis, and trust & safety. The piece outlines mixed evaluation methods—human review, LLM-as-a-judge, programmatic checks, and adversarial testing—and prescribes operationalizing these checks in CI/CD with production monitoring and feedback loops.

read more →

Fri, November 14, 2025

The Role of Human Judgment in an AI-Powered World Today

🧭 The essay argues that as AI capabilities expand, we must clearly separate tasks best handled by machines from those requiring human judgment. For narrow, fact-based problems—such as reading diagnostic tests—AI should be preferred when demonstrably more accurate. By contrast, many public-policy and justice questions involve conflicting values and no single factual answer; those judgment-laden decisions should remain primarily human responsibilities, with machines assisting implementation and escalating difficult cases.

read more →

Sat, November 8, 2025

Microsoft Reveals Whisper Leak: Streaming LLM Side-Channel

🔒 Microsoft has disclosed a novel side-channel called Whisper Leak that can let a passive observer infer the topic of conversations with streaming language models by analyzing encrypted packet sizes and timings. Researchers at Microsoft (Bar Or, McDonald and the Defender team) demonstrate classifiers that distinguish targeted topics from background traffic with high accuracy across vendors including OpenAI, Mistral and xAI. Providers have deployed mitigations such as random-length response padding; Microsoft recommends avoiding sensitive topics on untrusted networks, using VPNs, or preferring non-streaming models and providers that implemented fixes.

read more →

Thu, November 6, 2025

Equipping Autonomous AI Agents with Cyber Hygiene Practices

🔐 This post demonstrates a proof-of-concept for teaching autonomous agents internet safety by integrating real-time threat intelligence. Using LangChain with OpenAI and the Cisco Umbrella API, the example shows how an agent can extract domains and query dispositions to decide whether to connect. The agent returns clear disposition reports and abstains when no domains are present. The approach emphasizes decision-making over hardblocking.

read more →

Fri, October 31, 2025

Will AI Strengthen or Undermine Democratic Institutions

🤖 Bruce Schneier and Nathan E. Sanders present five key insights from their book Rewiring Democracy, arguing that AI is rapidly embedding itself in democratic processes and can both empower citizens and concentrate power. They cite diverse examples — AI-written bills, AI avatars in campaigns, judicial use of models, and thousands of government use cases — and note many adoptions occur with little public oversight. The authors urge practical responses: reform the tech ecosystem, resist harmful applications, responsibly deploy AI in government, and renovate institutions vulnerable to AI-driven disruption.

read more →

Thu, October 30, 2025

OpenAI Updates GPT-5 to Better Handle Emotional Distress

🧭 OpenAI rolled out an October 5 update that enables GPT-5 to better recognize and respond to mental and emotional distress in conversations. The change specifically upgrades GPT-5 Instant—the fast, low-end default—so it can detect signs of acute distress and route sensitive exchanges to reasoning models when needed. OpenAI says it developed the update with mental-health experts to prioritize de-escalation and provide appropriate crisis resources while retaining supportive, grounding language. The update is available broadly and complements new company-context access via connected apps.

read more →

Wed, October 29, 2025

BSI Warns of Growing AI Governance Gap in Business

⚠️ The British Standards Institution warns of a widening AI governance gap as many organisations accelerate AI adoption without adequate controls. An AI-assisted review of 100+ annual reports and two polls of 850+ senior leaders found strong investment intent but sparse governance: only 24% have a formal AI program and 47% use formal processes. The report highlights weaknesses in incident management, training-data oversight and inconsistent approaches across markets.

read more →

Tue, October 21, 2025

Amazon Nova adds customizable content moderation settings

🔒 Amazon announced that Amazon Nova models now support customizable content moderation settings for approved business use cases that require processing or generating sensitive content. Organizations can adjust controls across four domains—safety, sensitive content, fairness, and security—while Amazon enforces essential, non-configurable safeguards to protect children and preserve privacy. Customization is available for Amazon Nova Lite and Amazon Nova Pro in the US East (N. Virginia) region; customers should contact their AWS Account Manager to confirm eligibility.

read more →

Fri, October 17, 2025

Google's 2025 Cybersecurity Initiative: New Protections

🔒 Google is expanding protections during Cybersecurity Awareness Month 2025 with new features and guidance to counter scams and AI-driven threats. The company outlines a cohesive strategy for securing the AI ecosystem and introduces six new anti-scam measures to help users stay safe. It also launches Recovery Contacts to simplify account recovery and debuts CodeMender, an AI agent that automates code security. Additional updates support safer learning through responsible tools and partnerships.

read more →

Wed, October 15, 2025

OpenAI Sora 2 Launches in Azure AI Foundry Platform

🎬 Azure AI Foundry now includes OpenAI's Sora 2 in public preview, providing developers with realistic video generation from text, images, and video inputs inside a unified, enterprise-ready environment. The integration offers synchronized multilingual audio, physics-based world simulation, and fine-grained creative controls for shots, scenes, and camera angles. Microsoft highlights enterprise-grade security, input/output content filters, and availability via API starting today at $0.10 per second for 720×1280 and 1280×720 outputs.

read more →