All news with #google kubernetes engine tag

41 articles · page 2 of 3

December 4, 2025

Designing for GKE's Flat Network: Practical Recommendations

🔍 This post previews Google's new design recommendation for leveraging GKE's flat network, explaining how it differs from island-mode networking and how teams can adapt existing architectures. It highlights recommended patterns and a reference design that emulates island-mode behavior within the flat model. The guidance focuses on IP address management, scalability, and integration points to ease migration for critical workloads such as generative AI.

Google Kubernetes Engine Kubernetes Security How-To

November 21, 2025

Building the Largest Known GKE Cluster: 130,000 Nodes

🚀 Google Cloud engineers demonstrated an experimental GKE cluster running 130,000 nodes to validate extreme scalability for AI/ML workloads. The test sustained control-plane throughput near 1,000 operations per second, supported over one million datastore objects, and achieved a baseline of 130,000 Pods launching in 3 minutes 40 seconds. The project combined API-server caching KEPs, a Spanner-backed key-value storage backend, and job-level orchestration via Kueue to enable predictable admission, rapid preemption, and efficient utilization at massive scale.

Google Google Kubernetes Engine Research

November 17, 2025

Hands-on with Gemma 3: Deploying Open Models on GCP

🚀 Google Cloud introduces hands-on labs for Gemma 3, a family of lightweight open models offering multimodal (text and image) capabilities and efficient performance on smaller hardware footprints. The labs present two deployment paths: a serverless approach using Cloud Run with GPU support, and a platform approach using GKE for scalable production environments. Choose Cloud Run for simplicity and cost-efficiency or GKE Autopilot for control and robust orchestration to move models from local testing to production.

Google Gemini Cloud Run Google Kubernetes Engine

November 11, 2025

GKE: Unified Platform for Agents, Scale, and Inference

🚀 Google details a broad set of GKE and Kubernetes enhancements announced at KubeCon to address agentic AI, large-scale training, and latency-sensitive inference. GKE introduces Agent Sandbox (gVisor-based) for isolated agent execution and a managed GKE Agent Sandbox with snapshots and optimized compute. The platform also delivers faster autoscaling through Autopilot compute classes, Buffers API, and container image streaming, while inference is accelerated by GKE Inference Gateway, Pod Snapshots, and Inference Quickstart.

Google Google Kubernetes Engine Agent Security

November 7, 2025

Tiered KV Cache Boosts LLM Performance on GKE with HBM

🚀 LMCache implements a node-local, tiered KV Cache on GKE to extend the GPU HBM-backed Key-Value store into CPU RAM and local SSD, increasing effective cache capacity and hit ratio. In benchmarks using Llama-3.3-70B-Instruct on an A3 mega instance (8×nvidia-h100-mega-80gb), configurations that added RAM and SSD reduced Time-to-First-Token and materially increased token throughput for long system prompts. The results demonstrate a practical approach to scale context windows while balancing cost and latency on GKE.

Google Kubernetes Engine Google AI Security

October 31, 2025

GKE and Gemini CLI Integration Enhances Developer Workflows

🚀 Google has open-sourced the GKE Gemini CLI extension, bringing Google Kubernetes Engine directly into the Gemini CLI ecosystem while also functioning as an MCP server for other MCP clients. The extension injects GKE-specific context, tools, and tailored prompts so developers can use shorter, more natural language interactions and integrated slash commands to complete complex workflows. It simplifies common operations—like selecting models and accelerators or generating Kubernetes manifests for inference—while improving compatibility with Cloud Observability. The project is actively maintained with regular releases and community contributions.

Google Kubernetes Engine Gemini MCP DevSecOps

October 28, 2025

Giles AI on Google Cloud: Transforming Medical Research

🚀 Giles AI migrated its healthcare-focused platform to Google Cloud to reduce latency, improve scalability, and accelerate developer velocity. Using Google Kubernetes Engine, Cloud Run, and Compute Engine, the company orchestrates complex clinical data flows and routes prompts through Vertex AI and Model Garden to remain model-agnostic. Data storage and extraction are handled with Cloud SQL, Cloud Storage, and Document AI, while Cloud Armor and Security Command Center bolster security and compliance. Early customer results include dramatic reductions in research time and improvements in response accuracy.

Google Cloud Google Kubernetes Engine Cloud Run Vertex AI

October 28, 2025

Google Cloud launches managed DRANET for GKE with A4X Max

🚀 Google Cloud is previewing managed DRANET on GKE, enabling Kubernetes to treat high-performance RDMA network interfaces as schedulable resources. The integration aligns NICs and GPUs by NUMA topology to reduce latency and increase throughput, while abstracting away operational complexity. It launches with the new A4X Max instances to deliver topology-aware networking for large multi-GPU AI workloads. Developers can request specific network interfaces in pod specs and rely on GKE to co-schedule NICs and accelerators, improving utilization and simplifying operations.

Google Cloud Google Kubernetes Engine Nvidia Network Security

October 28, 2025

A4X Max, GKE Networking, and Vertex AI Training Now Shipping

🚀 Google Cloud is expanding its NVIDIA collaboration with the new A4X Max instances powered by NVIDIA GB300 NVL72, delivering 72 GPUs with high‑bandwidth NVLink and shared memory for demanding multimodal reasoning. GKE now supports DRANET for topology‑aware RDMA scheduling and integrates NVIDIA NeMo Guardrails into GKE Inference Gateway, while Vertex AI Model Garden will host NVIDIA Nemotron models. Vertex AI Training adds NeMo and NeMo‑RL recipes and a managed Slurm environment to accelerate large‑scale training and deployment.

Google Cloud Google Kubernetes Engine Vertex AI Nvidia

October 14, 2025

IBM Spectrum Symphony HostFactory Connectors for GCP

🚀 Google Cloud announces the general availability of open-source IBM Spectrum Symphony HostFactory connectors for Google Compute Engine and GKE. The connectors enable organizations to extend on‑premises Symphony clusters into Google Cloud or deploy fully cloud-native clusters with automatic provisioning and decommissioning to match workload demand. Partner-built by Accenture and validated by Aneo, the connectors support enterprise features such as Spot and on‑demand VMs, GPUs, Local SSD, Confidential VMs, Pub/Sub event-driven management, Kubernetes CRDs, and integration with managed instance group (MIG) APIs for large-scale HPC operations.

IBM Google Cloud Google Kubernetes Engine

September 24, 2025

GKE Autopilot Features Now Available to Qualified Clusters

🚀 Google Cloud has extended core Autopilot capabilities to qualified Standard GKE clusters, enabling access to the new container-optimized compute platform via built-in compute classes. Available initially to clusters in the Rapid release channel running 1.33.1-gke.1107000 or later, these features include the autopilot and autopilot-spot compute classes and a provisioning mode that supports gradual adoption. Benefits include rapid horizontal and vertical scaling, pay-for-request billing, efficient bin-packing, and support for GPUs and TPUs for AI workloads.

Google Google Kubernetes Engine Kubernetes Cloud Security

September 23, 2025

Escalante Uses JAX on TPUs for AI-driven Protein Design

🧬 Escalante leverages JAX's functional, composable design to combine many predictive models into a single differentiable objective for protein engineering. By translating models (including AlphaFold and Boltz-2) into a JAX-native stack and composing them serially or linearly, they compute gradients with respect to input sequences and evolve candidates via optimization. Each job samples thousands of sequences, filters to roughly ten lab-ready designs, and runs at scale on Google Kubernetes Engine using spot TPU v6e, yielding a reported 3.65x performance-per-dollar advantage over H100 GPUs.

Google Cloud Google Kubernetes Engine AI Safety

September 19, 2025

GKE Managed Lustre CSI Driver for AI and HPC Workloads

🚀 Managed Lustre on GKE is a managed parallel file system with a CSI driver that brings low-latency, high-throughput POSIX storage to Kubernetes for demanding AI and HPC workloads. It is recommended for training, checkpointing, and small-file patterns where GPUs/TPUs must stay utilized, while Cloud Storage is an alternative for large, higher-latency files. The article presents five operational best practices—data locality, tiering, networking, provisioning, and using Kubernetes Jobs with a shared PVC—to maximize performance and control costs.

Google Kubernetes Engine Kubernetes Security AI Security Product Launch

September 17, 2025

GKE Network Interface: From kubenet to the AI backbone

📡 Over the past decade, Google Cloud evolved GKE pod networking from basic kubenet and route-based clusters to VPC-native alias IPs and the eBPF-powered Cilium Dataplane V2, improving performance, scalability, and observability. The platform now supports extreme-scale AI workloads with multi-NIC, terabit throughput, and persistent IPs for stateful functions. Looking forward, Google is exploring the Kubernetes Network Driver and the DRANET reference to expose node-level network resources via Dynamic Resource Allocation.

Google Cloud Google Kubernetes Engine Network Security

September 10, 2025

GKE Inference Gateway and Quickstart Achieve GA Status

🚀 GKE Inference Gateway and GKE Inference Quickstart are now generally available, bringing production-ready inferencing features built on AI Hypercomputer. New capabilities include prefix-aware load balancing, disaggregated serving, vLLM support on TPUs and Ironwood TPUs, and model streaming with Anywhere Cache to cut model load times. These features target faster time-to-first-token and time-per-output-token, higher throughput, and lower inference costs, while Quickstart offers data-driven accelerator and configuration recommendations.

Google Cloud Google Kubernetes Engine AI Security Product Launch

September 10, 2025

Disaggregated AI Inference with NVIDIA Dynamo on GKE

⚡ This post announces a reproducible recipe to deploy NVIDIA Dynamo for disaggregated LLM inference on Google Cloud’s AI Hypercomputer using Google Kubernetes Engine, vLLM, and A3 Ultra (H200) GPUs. The recipe separates prefill and decode phases across dedicated GPU pools to reduce contention and lower latency. It includes single-node and multi-node examples and step-by-step deployment actions. The repository provides configuration guidance and future plans for broader GPU and engine support.

Nvidia Google Cloud Google Kubernetes Engine AI Security

September 5, 2025

GKE Turns 10 Hackathon: Build Agentic AI Microservices

🚀 Join the GKE Turns 10 Hackathon to build next‑generation microservices enhanced with agentic AI. Google provides sample applications (Bank of Anthos or Online Boutique), example agents on GitHub, documentation, quickstarts and a webinar to help teams get started. Submissions must run on GKE and use Google AI models such as Gemini, with agents interacting via APIs rather than altering core application code. Participants may also use the Agent Development Kit (ADK), Model Context Protocol (MCP) and Agent2Agent (A2A) to extend functionality.

Google Cloud Google Kubernetes Engine Agentic AI

August 28, 2025

Container-Optimized Compute Delivers Fast Autopilot Scaling

🚀 GKE Autopilot now runs on a container-optimized compute platform that rethinks autoscaling to deliver near-real-time capacity. The platform uses dynamically resizable VMs and a pool of pre-provisioned compute so nodes can be resized or allocated without disrupting workloads. Customers on GKE Autopilot 1.32+ get faster pod scheduling, improved HPA responsiveness, and support for in-place pod resize out of the box. Google recommends the general purpose compute class for small, gradually scaling services.

Google Google Kubernetes Engine

August 28, 2025

EuroDaT and Google Cloud: Secure Financial Data Exchange

🔒 EuroDaT, a state-owned data trustee, built safeAML with major German banks to enable controlled, pseudonymous transaction matching while preserving GDPR compliance. The cloud-native service runs on Google Cloud and Google Kubernetes Engine, using infrastructure-as-code, isolated VPCs and auditable processing so EuroDaT never accesses personal-data content. By letting banks request targeted supplementary information, safeAML accelerates suspicious-activity checks, reduces false positives and lays groundwork for wider use in ESG and health data sharing.

Google Cloud Google Kubernetes Engine Data Governance GDPR

August 28, 2025

EuroDaT and Google Cloud: Secure Financial Data Exchange

🔐 EuroDaT describes how its safeAML platform, built on Google Cloud and Google Kubernetes Engine, enables controlled, pseudonymous exchange of sensitive transaction data between banks. Acting as a neutral data trustee, EuroDaT never accesses personal content while automating secure, auditable workflows that replace error-prone phone calls. Pilots with German banks show faster, more accurate suspicion assessments and lower false positives.

Google Cloud Google Kubernetes Engine Data Governance