< ciso
brief />
Tag Banner

All news with #vertex ai tag

93 articles · page 2 of 5

Five Techniques to Optimize LLM Inference Efficiency

⚡ Karl Weinmeister frames LLM inference as an efficient frontier that trades latency against throughput and argues production systems often sit below this curve. He presents five actionable optimizations—semantic model routing, prefill/decode disaggregation, modern quantization, context-aware L7 routing with prefix caching, and speculative decoding—and explains their practical tradeoffs. A Vertex AI case study reports 35% faster time-to-first-token and doubled prefix cache hit rates after deploying GKE Inference Gateway.
read more →

Why Context Matters for AI Data Security with SDP Now

🔒 Google Cloud’s Sensitive Data Protection (SDP) now applies advanced AI context classifiers and image object detectors to identify and redact sensitive content across text and images. It detects medical and financial contexts, faces, passports, credit cards, and other PII, and can generate redacted versions so organizations keep valuable training data while protecting privacy. SDP supports both Vertex AI tuning and live agent interactions and integrates with Model Armor, Security Command Center, and contact center solutions.
read more →

Reduce 429 Errors and Build Resilient Vertex AI Apps

⚠️ Building LLM applications on Vertex AI can trigger 429 errors when request rates exceed available throughput, degrading user experience and increasing retries. This article explains consumption options—Standard and Priority PayGo, Provisioned Throughput, Flex PayGo, and Batch—and prescribes five operational practices: smart retries, global model routing, context caching, prompt optimization, and traffic shaping. Combining these approaches (for example PT for critical real-time traffic and Batch for latency-tolerant jobs) helps preserve performance and control costs.
read more →

BMW and Google Cloud Build Automated SLM Optimization

🚗 BMW Group and Google Cloud present a proof-of-concept pipeline to compress, fine-tune, evaluate, and deploy domain-specific small language models (SLMs) for in-vehicle voice commands. They position SLMs as a practical compromise between full cloud-based LLMs and constrained onboard hardware, reducing latency and network dependence. Using Vertex AI Pipelines, the automated workflow explores quantization, pruning, distillation, LoRA fine-tuning, and RL-based alignment, and validates models on Android/AOSP head-unit environments. The team publishes the pipeline code to encourage reuse and reproducible experimentation.
read more →

Agentic Autonomous Networks at MWC 2026 — Platform Advances

🚀 At MWC Barcelona, Google Cloud outlines a shift from AI-driven insights to agentic telco operations, showcasing tools that embed AI into network control to achieve Level 4–5 autonomy. The company highlights a dynamic network digital twin, a unified graph data layer using Spanner Graph and BigQuery, and real-time GNN predictions in Vertex AI. New open-source telco data pipelines and two proof-of-value agents — a data steward and autonomous network agents — aim to accelerate trials and reduce legacy bottlenecks.
read more →

Nano Banana 2 Brings Pro-Level Image AI to Enterprise

🖼️ Nano Banana 2 is Google’s latest image-generation and editing model, delivering Pro-level image quality and fast iteration for enterprise creative workflows. Powered by real-time web search and integrated with Gemini API in Vertex AI, it provides accurate, localized visuals plus premium features like text rendering, translations, and upscaling to 2K/4K. Enterprise-ready provenance is supported via SynthID and interoperable C2PA Content Credentials to surface how AI was used.
read more →

Google Releases Gemini 3.1 Pro for Enhanced Reasoning

🚀 Google announced Gemini 3.1 Pro, an upgraded foundation model in the Gemini 3 series that emphasizes deeper reasoning and complex problem solving. The model is available in preview in Vertex AI and Gemini Enterprise, and developers can access it through Google AI Studio, the Gemini API, Android Studio, Google Antigravity, and the Gemini CLI. Early customers report meaningful gains in speed, efficiency, and accuracy across code, 3D transformations, and product design workflows.
read more →

Provisioned Throughput on Vertex AI: Expanded Capacity

⚙️ Provisioned Throughput on Vertex AI standardizes reserved capacity across first-party, third-party, and open-source models, adding multimodal and operational enhancements to support production-scale AI agents. The update introduces Anthropic integration (private preview), PT for popular open models such as Llama 4, Qwen3, and GLM-4.7, and native support for high-bandwidth modalities including Gemini 3, Nano Banana, and Gemini Live API. Operational improvements — one-week PT terms, scheduled change orders, and explicit caching for long contexts — enable predictable latency, flexible commitments, and lower input costs for peak events and high-concurrency workloads.
read more →

Mastering Model Adaptation: Fine-Tuning on Google Cloud

🔧 This guide explains how to adapt foundation models on Google Cloud by fine-tuning both managed and self-managed workflows. It contrasts a fully managed Vertex AI Supervised Fine-Tuning path for models like Gemini with a customizable GKE approach using LoRA on open-source models such as Llama. The labs walk through data preparation, baseline evaluation, tuning, and automated evaluation metrics, as well as GKE infrastructure, GPU provisioning, security with Workload Identity, and containerized training for production readiness.
read more →

Seven Technical Lessons from Using Gemini at Scale

🧰 The Google Cloud samples team describes building a specialized end-to-end system that uses Gemini on Vertex AI and Genkit to produce production-ready educational code samples across many languages and products. Their architecture separates generation, validation, and delivery so LLM outputs are combined with deterministic automations, linters, unit tests, and human review. The post presents seven practical technical takeaways—decomposition, determinism, precise prompts, vetted evaluation, scaled downstream processes, end-to-end testing, and solid engineering practices—that drove reliable, scalable sample generation.
read more →

GKE Inference Gateway Cuts Latency for Vertex AI Performance

🚀 The Vertex AI team deployed the GKE Inference Gateway, built on the Kubernetes Gateway API, to reduce inference latency and improve cache efficiency without a custom scheduler. The gateway applies load-aware routing—scraping Prometheus metrics like KV cache utilization and queue depth—and content-aware routing that inspects request prefixes to send traffic to pods with warm context. In production this cut Time to First Token by ~35% for Qwen3-Coder, improved P95 by ~52% for a bursty chat model, and doubled prefix-cache hit rates from 35% to 70%.
read more →

Google Cloud Adds Anthropic Claude Opus 4.6 to Vertex AI

🚀 Google Cloud has added Anthropic's Claude Opus 4.6 to Vertex AI, extending its curated model catalog for enterprise and agentic workloads. Opus 4.6 is positioned for complex coding, polished document and spreadsheet generation, advanced tool calling, and sophisticated multi-step agents. Feature highlights include GA support for adaptive thinking, an effort parameter, 128k output tokens, and previews for a 1M context window and compaction API. Google emphasizes managed agent tooling, governance, and infrastructure to deploy Claude-powered agents at scale.
read more →

Ship Production-Ready AI and Multimodal Workshops Roadshow

🚀 Google Cloud is launching a two-day roadshow across North America focused on building production-grade and multimodal AI systems. Day 1, the Production-Ready AI Intensive, covers stability, security, and scalable architecture including multi-agent orchestration with the Agent Development Kit (ADK), A2A protocols on Cloud Run, automated evaluation via the Vertex AI Gen AI Evaluation SDK, and defenses like Model Armor and Sensitive Data Protection. Day 2, the Multimodal Frontier, is a hands-on, code-first workshop on real-time perception and interaction: simultaneous audio/video processing, Graph RAG with Spanner Graph, Persistent Memory Banks, and the Gemini Live API for zero-latency, interruptible agents. Sessions include labs, credits, and networking; seats are limited.
read more →

Building Employee Onboarding Agents with Gemini Enterprise

🔧 This guide explains how to build custom employee onboarding agents using the Agent Development Kit (ADK), Vertex AI Agent Engine, and Application Integration to connect conversational AI with enterprise systems such as ITSM, ERP, and CRM. It describes a grounded agentic workflow where a Gemini Enterprise front-end captures intent, a low-code Application Integration layer performs deterministic transformations and authentication, and backend systems execute transactions. The result is a role-aware, auditable onboarding experience that automates tasks like laptop provisioning while keeping business rules and approvals intact.
read more →

Google Cloud releases Vertex AI .NET extensions (preview)

🚀 Google.Cloud.VertexAI.Extensions brings Microsoft.Extensions.AI abstractions to .NET developers, enabling access to Google Gemini models on Vertex AI via a unified API. The preview package implements core interfaces — IChatClient, IEmbeddingGenerator, and an experimental IImageGenerator — and supports chat, streaming responses, embeddings, and image generation samples. It targets developers who want provider-agnostic integration and invites feedback while in pre-release.
read more →

Google Cloud Opens New Bangkok Region to Boost Thai AI

🚀 Google Cloud has launched a new Bangkok (asia-southeast3) region to deliver low-latency, high-performance cloud services while enabling local data residency under Thailand’s PDPA. The region is part of a USD $1 billion investment and is expected to generate THB 1.4 trillion (US$41 billion) in economic value over five years and support roughly 130,000 jobs annually. It offers certified security controls (ISO/IEC, PCI DSS, SOC), default encryption, customer-managed keys, and direct access to Vertex AI, enterprise Gemini, and generative models to accelerate local AI adoption.
read more →

Practical Guidance for Building Securely with SAIF on Cloud

🔐 Tom Curry and Anton Chuvakin from Google Cloud’s Office of the CISO present practical guidance for implementing the Secure AI Framework (SAIF) on Google Cloud. The piece emphasizes three operational principles: treat data as the perimeter, treat prompts like code, and require identity propagation for agentic AI. It maps 15 common AI risks to controls and highlights concrete tools and patterns—IAM, Dataplex, Vertex AI, Model Armor, Gemini, Apigee, and the Agent Development Kit—to operationalize SAIF.
read more →

Palo Alto Networks Automates DORs with Agentic AI Design

🤖 Palo Alto Networks automated creation of its internal Document of Record (DOR) using an agent built with Google's open-source Agent Development Kit (ADK) and hosted on Vertex AI Agent Engine. The agent leverages Vertex AI RAG Engine, Vertex AI Discovery Search, Gemini models, and Cloud Storage to retrieve and synthesize grounded answers to a standardized set of 140+ questions. A FastAPI webserver on GKE orchestrates parallel processing, manages state, and publishes completed DORs back to Salesforce via Cloud Pub/Sub, reducing manual effort and improving consistency.
read more →

Google Public Sector: Year of AI-Driven Transformation 2025

🤖 Google Public Sector summarizes a year of AI, cloud, and security milestones, spotlighting Gemini for Government, Vertex AI, and FedRAMP High authorizations for productivity and analytics offerings. The post highlights DoD IL6 and CMMC Level 2 certifications, partnerships with DLA and GDIT, and large-scale deployments such as GenAI.mil. It emphasizes secure, agentic workflows, edge-capable deployments, and a focus on delivering accredited commercial cloud services to accelerate mission impact.
read more →

From Code to Cloud: Three Labs for Deploying AI Agents

🚀 These hands-on labs guide developers through three Google Cloud deployment options to move AI agents from local prototypes to production. The Vertex AI Agent Engine offers a fully managed, Python-optimized runtime that handles execution, memory, and tool invocation. Cloud Run provides a serverless container experience with autoscaling and language flexibility, while GKE delivers orchestrated control for microservice deployments.
read more →