All news with #model isolation tag
Tue, December 9, 2025
Google deploys second model to guard Gemini Chrome agent
🛡️ Google has added a separate user alignment critic to its Gemini-powered Chrome browsing agent to vet and block proposed actions that do not match user intent. The critic is isolated from web content and sees only metadata about planned actions, providing feedback to the primary planning model when it rejects a step. Google also enforces origin sets to limit where the agent can read or act, requires confirmations for banking, medical, password use and purchases, and runs a classifier plus automated red‑teaming to detect prompt injection attempts during preview.
Tue, December 9, 2025
Google Adds Layered Defenses to Chrome's Agentic AI
🛡️ Google announced a set of layered security measures for Chrome after adding agentic AI features, aimed at reducing the risk of indirect prompt injections and cross-origin data exfiltration. The centerpiece is a User Alignment Critic, a separate model that reviews and can veto proposed agent actions using only action metadata to avoid being poisoned by malicious page content. Chrome also enforces Agent Origin Sets via a gating function that classifies task-relevant origins into read-only and read-writable sets, requires gating approval before adding new origins, and pairs these controls with a prompt-injection classifier, Safe Browsing, on-device scam detection, user work logs, and explicit approval prompts for sensitive actions.
Mon, December 8, 2025
Architecting Security for Agentic Browsing in Chrome
🛡️ Chrome describes a layered approach to secure agentic browsing with Gemini, focusing on defenses against indirect prompt injection and goal‑hijacking. A new User Alignment Critic — an isolated, high‑trust model — reviews planned agent actions using only metadata and can veto misaligned steps. Chrome also enforces Agent Origin Sets to limit readable and writable origins, adds deterministic confirmations for sensitive actions, runs prompt‑injection detection in real time, and sustains continuous red‑teaming and monitoring to reduce exfiltration and unwanted transactions.
Thu, November 13, 2025
Four Steps for Startups to Build Multi-Agent Systems
🤖 This post outlines a concise four-step framework for startups to design and deploy multi-agent systems, illustrated through a Sales Intelligence Agent example. It recommends choosing between pre-built, partner, or custom agents and describes using Google's Agent Development Kit (ADK) for code-first control. The guide covers hybrid architectures, tool-based state isolation, secure data access, and a three-step deployment blueprint to run agents on Vertex AI Agent Engine and Cloud Run.
Mon, October 6, 2025
Vertex AI Model Garden Adds Self-Deploy Proprietary Models
🔐 Google Cloud’s Vertex AI now supports secure self-deployment of proprietary third-party models directly into customer VPCs via the Model Garden. Customers can discover, license, and deploy closed-source and restricted-license models from partners such as AI21 Labs, Mistral AI, Qodo and others, with one-click provisioning and managed inference. Deployments adhere to VPC-SC controls, selectable regions, autoscaling, and pay-as-you-go billing. This central catalog brings Google, open, and partner models together for enterprise-grade control and compliance.
Wed, August 27, 2025
Cloudflare's Edge-Optimized LLM Inference Engine at Scale
⚡ Infire is Cloudflare’s new, Rust-based LLM inference engine built to run large models efficiently across a globally distributed, low-latency network. It replaces Python-based vLLM in scenarios where sandboxing and dynamic co-hosting caused high CPU overhead and reduced GPU utilization, using JIT-compiled CUDA kernels, paged KV caching, and fine-grained CUDA graphs to cut startup and runtime cost. Early benchmarks show up to 7% lower latency on H100 NVL hardware, substantially higher GPU utilization, and far lower CPU load while powering models such as Llama 3.1 8B in Workers AI.
Wed, August 27, 2025
How Cloudflare Runs More AI Models on Fewer GPUs with Omni
🤖 Cloudflare explains how Omni, an internal platform, consolidates many AI models onto fewer GPUs using lightweight process isolation, per-model Python virtual environments, and controlled GPU over-commitment. Omni’s scheduler spawns and manages model processes, isolates file systems with a FUSE-backed /proc/meminfo, and intercepts CUDA allocations to safely over-commit GPU RAM. The result is improved availability, lower latency, and reduced idle GPU waste.