All news with #gke tag
Wed, December 10, 2025
Google Adds Official MCP Support Across Key Cloud Services
🔌 Google announced fully-managed, remote support for Anthropic's Model Context Protocol (MCP), enabling agents and standard MCP clients to access a unified, enterprise-ready endpoint for Google and Google Cloud services. The managed MCP servers integrate with services like Google Maps, BigQuery, GCE, and GKE to let agents perform geospatial queries, in-place analytics, and infrastructure operations. Built-in discovery, governance, IAM controls, audit logging, and Google Cloud Model Armor provide security and observability. Developers can expose and govern APIs via Apigee and the Cloud API Registry to create discoverable tools for agentic workflows.
Mon, December 8, 2025
Google Application Design Center Now Generally Available
🛠️ Google's Application Design Center is now generally available, delivering a visual, canvas-style, AI-assisted environment to design and deploy Terraform-backed application templates. It pairs Gemini Cloud Assist with opinionated Terraform components to generate deployable infrastructure patterns and architecture diagrams. Integrated with App Hub and Cloud Hub, it makes applications discoverable, observable, and manageable, while supporting BYO-Terraform, GitOps, and enterprise governance to accelerate platform engineering and developer self-service.
Thu, December 4, 2025
NVIDIA Run:ai Model Streamer Adds Cloud Storage Support
🚀 The NVIDIA Run:ai Model Streamer now supports native Google Cloud Storage access, accelerating model load and inference startup for vLLM workloads on GKE. By streaming tensors directly from Cloud Storage into GPU memory and using distributed, NVLink-aware transfers, the streamer dramatically reduces cold-start latency and idle GPU time. Enabling it in vLLM is a single-flag change and it can leverage GKE Workload Identity for secure, keyless access.
Thu, December 4, 2025
Designing for GKE's Flat Network: Practical Recommendations
🔍 This post previews Google's new design recommendation for leveraging GKE's flat network, explaining how it differs from island-mode networking and how teams can adapt existing architectures. It highlights recommended patterns and a reference design that emulates island-mode behavior within the flat model. The guidance focuses on IP address management, scalability, and integration points to ease migration for critical workloads such as generative AI.
Wed, December 3, 2025
Building Conversational Genomics with Multi-Agent AI
🧬 Combining Google’s ADK, Gemini, and Cloud infrastructure, this work reframes variant interpretation as a conversational workflow that removes repetitive scripting and context switching. A two-phase design performs heavy VEP annotation once, stores versioned ADK artifacts and public BigQuery datasets, and enables sub-5-second interactive queries via a QueryAgent. Validation with an APOB spike-in demonstrated single-variant precision, compatibility across DeepVariant versions, and scalability to ~8.8M variants.
Tue, December 2, 2025
GKE Turns 10 Hackathon: Winners and Technical Highlights
🚀 The GKE Turns 10 Hackathon showcased developer teams building agentic AI on GKE integrated with Google models such as Gemini. More than 4,700 participants from 133 countries produced 133 projects demonstrating multi-agent pipelines, model orchestration, and microservice integration. Grand prize winner Amie Wei’s Cart-to-Kitchen assistant uses GKE Autopilot, the Agent Development Kit (ADK), and Agent-to-Agent protocols to analyze grocery carts and recommend recipes. Google also announced GEAR, an educational sprint launching in early 2026 to help developers learn, build, and deploy AI agents.
Fri, November 21, 2025
Building the Largest Known GKE Cluster: 130,000 Nodes
🚀 Google Cloud engineers demonstrated an experimental GKE cluster running 130,000 nodes to validate extreme scalability for AI/ML workloads. The test sustained control-plane throughput near 1,000 operations per second, supported over one million datastore objects, and achieved a baseline of 130,000 Pods launching in 3 minutes 40 seconds. The project combined API-server caching KEPs, a Spanner-backed key-value storage backend, and job-level orchestration via Kueue to enable predictable admission, rapid preemption, and efficient utilization at massive scale.
Tue, November 18, 2025
Using Private NAT for Overlapping Private IP Spaces
🔒 Google Cloud's Private NAT enables secure private-to-private translation to connect networks with overlapping or non-routable IPv4 ranges without running NAT appliances. As a managed Cloud NAT feature, it delivers high availability, automatic scalability, and centralized control for hybrid and multi‑VPC topologies. The post includes practical gcloud examples and Network Connectivity Center use cases to guide implementation.
Mon, November 17, 2025
Hands-on with Gemma 3: Deploying Open Models on GCP
🚀 Google Cloud introduces hands-on labs for Gemma 3, a family of lightweight open models offering multimodal (text and image) capabilities and efficient performance on smaller hardware footprints. The labs present two deployment paths: a serverless approach using Cloud Run with GPU support, and a platform approach using GKE for scalable production environments. Choose Cloud Run for simplicity and cost-efficiency or GKE Autopilot for control and robust orchestration to move models from local testing to production.
Tue, November 11, 2025
GKE: Unified Platform for Agents, Scale, and Inference
🚀 Google details a broad set of GKE and Kubernetes enhancements announced at KubeCon to address agentic AI, large-scale training, and latency-sensitive inference. GKE introduces Agent Sandbox (gVisor-based) for isolated agent execution and a managed GKE Agent Sandbox with snapshots and optimized compute. The platform also delivers faster autoscaling through Autopilot compute classes, Buffers API, and container image streaming, while inference is accelerated by GKE Inference Gateway, Pod Snapshots, and Inference Quickstart.
Tue, November 11, 2025
Agent Sandbox: Kubernetes Enhancements for AI Agents
🛡️ Agent Sandbox is a new Kubernetes primitive designed to run AI agents with strong, kernel-level isolation. Built on gVisor with optional Kata Containers and developed in the Kubernetes community as a CNCF project, it reduces risks from agent-executed code. On GKE, managed gVisor, container-optimized compute and pre-warmed sandbox pools deliver sub-second startup latency and up to 90% cold-start improvement. A Python SDK and a simple API abstract YAML so AI engineers can manage sandbox lifecycles without deep infrastructure expertise; Agent Sandbox is open source and deployable on GKE today.
Mon, November 10, 2025
Full-Stack Approach to Scaling RL for LLMs on GKE at Scale
🚀 Google Cloud describes a full-stack solution for running high-scale Reinforcement Learning (RL) with LLMs, combining custom TPU hardware, NVIDIA GPUs, and optimized software libraries. The approach addresses RL's hybrid demands—reducing sampler latency, easing memory contention across actor/critic/reward models, and accelerating weight copying—by co-designing hardware, storage (Managed Lustre, Cloud Storage), and orchestration on GKE. The blog emphasizes open-source contributions (vLLM, llm-d, MaxText, Tunix) and integrations with Ray and NeMo RL recipes to improve portability and developer productivity. It also highlights mega-scale orchestration and multi-cluster strategies to run production RL jobs at tens of thousands of nodes.
Mon, November 10, 2025
Google Cloud N4D VMs with AMD EPYC Turin Generally Available
🚀 Google Cloud announces general availability of the N4D machine series built on 5th Gen AMD EPYC 'Turin' processors and Google's Titanium infrastructure. N4D targets cost-optimized, general-purpose workloads — web and app servers, data analytics, and containerized microservices — with up to 96 vCPUs, 768 GB DDR5, 50 Gbps networking, and Hyperdisk storage. Google cites up to 3.5x web-serving throughput versus N2D and material price-performance gains for general compute and Java workloads.
Fri, November 7, 2025
Tiered KV Cache Boosts LLM Performance on GKE with HBM
🚀 LMCache implements a node-local, tiered KV Cache on GKE to extend the GPU HBM-backed Key-Value store into CPU RAM and local SSD, increasing effective cache capacity and hit ratio. In benchmarks using Llama-3.3-70B-Instruct on an A3 mega instance (8×nvidia-h100-mega-80gb), configurations that added RAM and SSD reduced Time-to-First-Token and materially increased token throughput for long system prompts. The results demonstrate a practical approach to scale context windows while balancing cost and latency on GKE.
Tue, November 4, 2025
Kubernetes introduces control-plane minor-version rollback
🔁 Google and the Kubernetes community introduced control-plane minor-version rollback in Kubernetes 1.33, giving operators a safe, observable path to revert control-plane upgrades. The new KEP-4330 emulated-version model separates binary upgrades from API and storage transitions into a two-step process, enabling validation before committing changes. This capability is available in open-source Kubernetes and will be generally available in GKE 1.33 soon, reducing upgrade risk and shortening recovery time from unexpected regressions.
Tue, November 4, 2025
How Google Cloud Networking Supports AI Workloads at Scale
🔗 Networking is a critical enabler for AI on Google Cloud, connecting models, storage, and inference endpoints while preserving security and performance. The post outlines seven capabilities—from private API access and RDMA-backed GPU interconnects to hybrid Cross-Cloud links—that reduce latency, prevent data exfiltration, and simplify model serving. It also highlights options for exposing inference (managed services, GKE, load balancing) and previews AI-driven network operations using Gemini.
Mon, November 3, 2025
Ray on TPUs with GKE: Native, Lower-Friction Integration
🚀 Google Cloud and Anyscale have enhanced the Ray experience on Cloud TPUs with GKE to reduce setup complexity and improve performance. The new ray.util.tpu library and a SlicePlacementGroup with a label_selector API automatically reserve co-located TPU slices and preserve SPMD topology to avoid resource fragmentation. Ray Train and Ray Serve gain expanded TPU support including alpha JAX training, while TPU metrics and libtpu logs appear in the Ray Dashboard for faster troubleshooting and migration between GPUs and TPUs.
Mon, November 3, 2025
Ray on GKE: New AI Scheduling and Scaling Features
🚀 Google Cloud and Anyscale describe tighter integration between Ray and Kubernetes to improve distributed AI scheduling and autoscaling on GKE. The release introduces a Ray Label Selector API (Ray v2.49) to align task, actor and placement-group placement with Kubernetes labels and GKE custom compute classes, enabling targeted placement and fallback strategies for GPUs and markets. It also adds Dynamic Resource Allocation for A4X/GB200 racks, writable cgroups for Ray resource isolation on GKE v1.34+, TPU/JAX training support via a JAXTrainer in Ray v2.49, and in-place pod resizing (Kubernetes v1.33) for vertical autoscaling and higher efficiency.
Fri, October 31, 2025
GKE and Gemini CLI Integration Enhances Developer Workflows
🚀 Google has open-sourced the GKE Gemini CLI extension, bringing Google Kubernetes Engine directly into the Gemini CLI ecosystem while also functioning as an MCP server for other MCP clients. The extension injects GKE-specific context, tools, and tailored prompts so developers can use shorter, more natural language interactions and integrated slash commands to complete complex workflows. It simplifies common operations—like selecting models and accelerators or generating Kubernetes manifests for inference—while improving compatibility with Cloud Observability. The project is actively maintained with regular releases and community contributions.
Thu, October 30, 2025
Global Payments: Resilient Scale Architecture with Cloud SQL
☁️ Global Payments partnered with Google Cloud to design a multi-region, highly available database architecture using Cloud SQL Enterprise Plus. The deployment spans three regions with zonal replication, read replicas, cascading replication, and Cloud SQL Auth Proxy integration to support low-latency reads and rapid failover. This configuration yields near-zero planned downtime, sub-minute RTO and zero RPO for Tier 1 workloads, while meeting PCI DSS, GDPR, and NIST requirements.