All news with #google kubernetes engine tag

41 articles

July 9, 2026

GKE Autopilot Clusters with Managed DRANET

🛠️ This blog explains how to configure GKE Autopilot clusters to use GKE managed DRANET for GPU and TPU workloads. It outlines the setup flow: create a VPC, deploy an Autopilot cluster, define a custom ComputeClass, create a ResourceClaimTemplate for RDMA (GPUs) or netdev (TPUs), and deploy workloads that reference those resources. Examples and YAML snippets demonstrate GPU and TPU ComputeClasses, resource claim templates, and a deployment that binds pods to accelerators.

Google Kubernetes Engine Kubernetes Security

July 8, 2026

Google Cloud unveils C4N network and storage VMs

🚀 C4N is Google Cloud’s new network- and block-storage-optimized Compute Engine instance family, now generally available after its Next ’26 preview. Built on a custom Titanium offload architecture and 5th Gen Intel Xeon CPUs, C4N delivers up to 400 Gbps network bandwidth, 95M PPS, and up to 25 GiB/s with Hyperdisk Extreme. It targets network-intensive, storage-heavy, and latency-sensitive workloads without requiring premium add-ons.

Google Cloud Google Kubernetes Engine

June 18, 2026

Ray Serve LLM on GKE: Major performance gains

🚀 Developers using Ray Serve for LLM inference on Google Kubernetes Engine (GKE) now get significantly better performance thanks to a joint effort with Anyscale. Three architectural changes — HAProxy integration for internal routing, a direct token streaming path, and a v2 Ray executor backend for vLLM — reduce overhead and latency. Benchmarks on A4 VMs with NVIDIA HGX B200 hardware show up to 5x higher throughput and 8x lower latency, while preserving Ray's developer-friendly features.

Google Kubernetes Engine Vertex AI Product Update

June 17, 2026

Deploy a Remote MCP Server to GKE in 30 Minutes

🔧 This guide explains how to build and deploy a remote Model Context Protocol (MCP) server on Google Kubernetes Engine (GKE) using the Streamable HTTP transport. It covers prerequisites, creating a simple math MCP server with FastMCP, local testing, containerizing the server, and pushing the image to Artifact Registry. Finally, it details deploying to GKE Autopilot and exposing the server securely with the Kubernetes Gateway API and managed SSL.

Google Google Kubernetes Engine MCP Security

June 9, 2026

GKE Inference Gateway Boosts AI Inference Efficiency

🚀 GKE Inference Gateway uses prefix caching and model-aware routing to reduce accelerator idle time and speed up LLM inference. By matching request prefixes to pods that already hold the KV cache, it avoids repeated recomputation and lowers latency compared with naive round-robin load balancing. Independent benchmarks show 15.7% higher throughput, 92.8% faster time-to-first-token, and 62.6% lower inter-token latency. Snap reports 75–80% prefix cache hit rates in production integrations.

Google Kubernetes Engine AI Runtime Security Kubernetes Security

June 2, 2026

Multi‑cluster GKE inference with TPUs and DRANET

🧭 This blog documents an experiment using Google Cloud to deploy a Gemma 3 inference workload across two regional GKE clusters, leveraging TPU v6e instances, managed DRANET for accelerator networking, and a multi-cluster Inference Gateway for cross‑region routing and failover. It describes building VPCs, reserving internal IPs, configuring Cloud Storage FUSE for model storage, creating TPU node pools with managed DRANET, registering clusters into a GKE Fleet, and deploying the inference server and gateway with health checks and autoscaling metrics. The objective is resilient, low‑latency routing to the nearest region with automatic failover to the other region if one fails.

Google Cloud Google Kubernetes Engine Gemini

June 1, 2026

GKE standby buffers lower autoscaling latency and cost

🚀 Google announces GKE standby buffers to complement active buffers, providing low-cost suspended node capacity that resumes faster than cold node provisioning. Standby buffers store node state to disk, releasing compute and memory costs while keeping persistent disk and IP charges, enabling near-instant scheduling with only a small single-digit percent overhead. Together, active and standby buffers reduce pod scheduling latency, replace manual balloon-pod workarounds, and help balance performance and cost for spiky workloads.

Google Google Kubernetes Engine Cloud Security Infrastructure Security

May 20, 2026

GKE Agent Sandbox GA and Agent Substrate Launch on GKE

🚀 Google Cloud announced general availability of GKE Agent Sandbox and introduced the open-source Agent Substrate. Agent Sandbox is a cloud-native execution environment designed for AI agents, offering pod snapshots to suspend idle workloads, an integrated warm pool for sub-second provisioning, gVisor and pluggable kernel isolation, and standby suspended VMs to reduce warm-pool cost. Agent Substrate aims to provide a minimal control plane and scheduler optimizations to support ultra-dense, low-latency agent workloads at scale.

Google Google Kubernetes Engine Agent Security Agentic AI

May 8, 2026

GKE Node Startup Up to 4x Faster for Autopilot Workloads

🚀 Google Cloud has reworked GKE node provisioning to deliver up to 4× faster node startup for qualifying nodes, reducing cold-start latency out of the box. This architectural upgrade combines intelligent compute buffers, fast-starting virtual machines, and a redesigned control plane so clusters scale more quickly without any customer configuration. The improvement is live for GKE Autopilot on select NVIDIA and general-purpose instance types, lowering the need to over-provision and speeding AI inference.

Google Google Kubernetes Engine Product Update

April 8, 2026

GKE Cloud Storage FUSE Profiles for AI/ML Workload I/O

⚡ GKE’s Cloud Storage FUSE Profiles automate performance tuning for AI/ML workloads by providing pre-defined, dynamically managed StorageClasses optimized for training, serving, and checkpointing. Instead of manually adjusting many mount and CSI options, users select a profile and GKE scans the bucket and node resources to calculate cache sizes and backing media. The CSI driver mounts the volume with those calculated options and dynamically adjusts cache behavior using real-time signals to maximize throughput while protecting node stability.

Google Kubernetes Engine Kubernetes Security AI Security

April 8, 2026

Experimenting with GPUs, GKE DRANET and Inference Gateway

🔧 This post walks through deploying and serving a large model on Google Kubernetes Engine using managed DRANET and NVIDIA B200 GPUs. It explains how RDMA networking is provisioned as an isolated regional VPC for low-latency GPU-to-GPU communication and how to provision A4 nodes and reservations for RoCEv2-capable accelerators. The author provides example gcloud and kubectl commands to create the cluster, a GPU node pool with DRA labels, a ResourceClaimTemplate for mrdma workloads, and steps to serve a DeepSeek model privately via GKE Inference Gateway and a regional internal Application Load Balancer.

Google Kubernetes Engine Nvidia DeepSeek

April 1, 2026

Top Infrastructure and GKE Sessions at Cloud Next '26

📣 This guide highlights the Infrastructure and GKE sessions at Cloud Next '26, offering a curated set of technical breakouts across Compute, AI infrastructure, migration, modernization, and scale. Attend spotlights and deep dives to hear from Google leaders and engineering teams about Gemini, Google Distributed Cloud, and the AI Hypercomputer. Sessions cover TPU/GPU roadmaps, high‑performance compute, agentic AI pipelines, and practical migration and FinOps strategies designed to help organizations build resilient, AI‑ready infrastructure.

Google Google Kubernetes Engine Product Launch

April 1, 2026

Unifying Real-Time and Async Inference with GKE Platform

🚀 GKE Inference Gateway enables teams to run both real-time and asynchronous AI inference on a single shared pool of accelerators (GPUs/TPUs). It applies latency-aware scheduling using runtime signals such as KV cache utilization to prioritize deterministic, low-latency requests while treating queued batch work as 'filler' via an Async Processor Agent integrated with Cloud Pub/Sub. The open-source stack reduces idle capacity, consolidates software stacks, and preserves strict priority and retry controls for reliable delivery.

Google Kubernetes Engine Agentic AI Product Update

March 31, 2026

GKE Active Buffer reduces Kubernetes scale-out latency

⚡Active Buffer is a GKE preview that implements the Kubernetes CapacityBuffer API to remove scale-out latency by keeping spare node capacity warm. It replaces manual 'balloon' pod hacks and costly over-provisioning with a declarative resource the Cluster Autoscaler treats as pending demand, so critical pods can land instantly. Buffers can be sized by fixed replicas, percentage of deployments, or resource limits.

Google Kubernetes Engine Kubernetes

March 17, 2026

Multi-Cluster GKE Inference Gateway for Scalable AI

🚀 Google Cloud announced the preview of the multi-cluster GKE Inference Gateway, an extension of the GKE Gateway API that provides model-aware, intelligent load balancing across multiple GKE clusters and regions. It centralizes ingress configuration in a dedicated "config cluster" while exporting model-serving backends from distributed "target clusters." The gateway pools GPUs/TPUs, supports routing based on custom metrics, and offers in-flight request limits to optimize latency, utilization, and fault tolerance.

Google Cloud Google Kubernetes Engine AI Runtime Security

March 5, 2026

GKE Adds Native Custom Metrics for Horizontal Scaling

🚀 Google Cloud now provides native custom metrics for GKE Horizontal Pod Autoscaler (HPA), eliminating the need for external adapters, agents, and complex Workload Identity bindings. The agentless design sources pod metrics directly and exposes them via a new AutoscalingMetric controller, reducing latency, cost, and operational fragility. Users declare an AutoscalingMetric that points to a pod metric and reference it in an HPA, allowing HPAs to scale on custom workload signals just like CPU or memory. Google frames this as an initial step toward intent-based autoscaling for AI, gaming, batch, and other demanding workloads.

Google Kubernetes Engine Kubernetes Security

March 4, 2026

GKE for Telco: Building a Resilient AI-Native Core

🚀 Google Cloud demonstrates how Google Kubernetes Engine (GKE) can form a high-performance foundation for telco modernization via two complementary paths: cloud-centric evolution for full cloud migration and strategic hybrid modernization to retain local control over latency-sensitive functions. The post highlights carrier-grade enhancements—multi-networking API, simulated L2, a telco CNI, persistent IP, and GKE IP route—with sub-second convergence and HA Policy to minimize downtime. It frames modernization as a means to enable predictive AIOps, intent-driven automation, faster time-to-market, and new monetization opportunities through AI and data platforms.

Google Cloud Google Kubernetes Engine Kubernetes Security

February 6, 2026

Starfish Space Uses Google Cloud for Satellite Servicing

🚀 Starfish Space is using Google Cloud to accelerate development and validation of its autonomous satellite-servicing vehicle, Otter. The company runs millions of Monte Carlo simulations on Google Compute Engine and Google Kubernetes Engine to train and harden docking software in virtual orbital environments. Managed Kubernetes lets engineers scale high-performance compute for complex simulations and control costs by scaling down resources when not required. This software-first model supports contracts with NASA, the U.S. Space Force, SES, and the Space Development Agency.

Google Cloud Google Kubernetes Engine

January 28, 2026

Faster GKE Node Pool Auto-Creation with Concurrency

🚀 Google Cloud announced concurrency for GKE node pool auto-creation, significantly reducing provisioning latency and improving autoscaling responsiveness. Internal benchmarks report up to an 85% improvement in provisioning speed, especially for heterogeneous, multi-tenant, and AI workloads that require multiple distinct node types. The improvement is available in version 1.34.1-gke.1829001 and requires only upgrading GKE; no additional configuration is necessary.

Google Kubernetes Engine Product Update

December 19, 2025

Supercharging Agentic Workloads on GKE with Sandboxing

🔒 The post summarizes a recent Agent Factory episode where Google product leaders discuss running agentic workloads on GKE. It highlights the Agent Development Kit (ADK), containerized deployments to Artifact Registry, and why Kubernetes provides governance and fine-grained control for large-scale agents. Google demonstrated an Agent Sandbox using gVisor and strict network policies, and introduced Pod Snapshots to cut sandbox startup from minutes to seconds, enabling lower-latency, secure agent workflows.

Google Kubernetes Engine Agent Security Agentic AI