< ciso
brief />
Tag Banner

All news with #google kubernetes engine tag

34 articles · page 2 of 2

Google Cloud launches managed DRANET for GKE with A4X Max

🚀 Google Cloud is previewing managed DRANET on GKE, enabling Kubernetes to treat high-performance RDMA network interfaces as schedulable resources. The integration aligns NICs and GPUs by NUMA topology to reduce latency and increase throughput, while abstracting away operational complexity. It launches with the new A4X Max instances to deliver topology-aware networking for large multi-GPU AI workloads. Developers can request specific network interfaces in pod specs and rely on GKE to co-schedule NICs and accelerators, improving utilization and simplifying operations.
read more →

A4X Max, GKE Networking, and Vertex AI Training Now Shipping

🚀 Google Cloud is expanding its NVIDIA collaboration with the new A4X Max instances powered by NVIDIA GB300 NVL72, delivering 72 GPUs with high‑bandwidth NVLink and shared memory for demanding multimodal reasoning. GKE now supports DRANET for topology‑aware RDMA scheduling and integrates NVIDIA NeMo Guardrails into GKE Inference Gateway, while Vertex AI Model Garden will host NVIDIA Nemotron models. Vertex AI Training adds NeMo and NeMo‑RL recipes and a managed Slurm environment to accelerate large‑scale training and deployment.
read more →

IBM Spectrum Symphony HostFactory Connectors for GCP

🚀 Google Cloud announces the general availability of open-source IBM Spectrum Symphony HostFactory connectors for Google Compute Engine and GKE. The connectors enable organizations to extend on‑premises Symphony clusters into Google Cloud or deploy fully cloud-native clusters with automatic provisioning and decommissioning to match workload demand. Partner-built by Accenture and validated by Aneo, the connectors support enterprise features such as Spot and on‑demand VMs, GPUs, Local SSD, Confidential VMs, Pub/Sub event-driven management, Kubernetes CRDs, and integration with managed instance group (MIG) APIs for large-scale HPC operations.
read more →

GKE Autopilot Features Now Available to Qualified Clusters

🚀 Google Cloud has extended core Autopilot capabilities to qualified Standard GKE clusters, enabling access to the new container-optimized compute platform via built-in compute classes. Available initially to clusters in the Rapid release channel running 1.33.1-gke.1107000 or later, these features include the autopilot and autopilot-spot compute classes and a provisioning mode that supports gradual adoption. Benefits include rapid horizontal and vertical scaling, pay-for-request billing, efficient bin-packing, and support for GPUs and TPUs for AI workloads.
read more →

Escalante Uses JAX on TPUs for AI-driven Protein Design

🧬 Escalante leverages JAX's functional, composable design to combine many predictive models into a single differentiable objective for protein engineering. By translating models (including AlphaFold and Boltz-2) into a JAX-native stack and composing them serially or linearly, they compute gradients with respect to input sequences and evolve candidates via optimization. Each job samples thousands of sequences, filters to roughly ten lab-ready designs, and runs at scale on Google Kubernetes Engine using spot TPU v6e, yielding a reported 3.65x performance-per-dollar advantage over H100 GPUs.
read more →

GKE Managed Lustre CSI Driver for AI and HPC Workloads

🚀 Managed Lustre on GKE is a managed parallel file system with a CSI driver that brings low-latency, high-throughput POSIX storage to Kubernetes for demanding AI and HPC workloads. It is recommended for training, checkpointing, and small-file patterns where GPUs/TPUs must stay utilized, while Cloud Storage is an alternative for large, higher-latency files. The article presents five operational best practices—data locality, tiering, networking, provisioning, and using Kubernetes Jobs with a shared PVC—to maximize performance and control costs.
read more →

GKE Network Interface: From kubenet to the AI backbone

📡 Over the past decade, Google Cloud evolved GKE pod networking from basic kubenet and route-based clusters to VPC-native alias IPs and the eBPF-powered Cilium Dataplane V2, improving performance, scalability, and observability. The platform now supports extreme-scale AI workloads with multi-NIC, terabit throughput, and persistent IPs for stateful functions. Looking forward, Google is exploring the Kubernetes Network Driver and the DRANET reference to expose node-level network resources via Dynamic Resource Allocation.
read more →

GKE Inference Gateway and Quickstart Achieve GA Status

🚀 GKE Inference Gateway and GKE Inference Quickstart are now generally available, bringing production-ready inferencing features built on AI Hypercomputer. New capabilities include prefix-aware load balancing, disaggregated serving, vLLM support on TPUs and Ironwood TPUs, and model streaming with Anywhere Cache to cut model load times. These features target faster time-to-first-token and time-per-output-token, higher throughput, and lower inference costs, while Quickstart offers data-driven accelerator and configuration recommendations.
read more →

Disaggregated AI Inference with NVIDIA Dynamo on GKE

⚡ This post announces a reproducible recipe to deploy NVIDIA Dynamo for disaggregated LLM inference on Google Cloud’s AI Hypercomputer using Google Kubernetes Engine, vLLM, and A3 Ultra (H200) GPUs. The recipe separates prefill and decode phases across dedicated GPU pools to reduce contention and lower latency. It includes single-node and multi-node examples and step-by-step deployment actions. The repository provides configuration guidance and future plans for broader GPU and engine support.
read more →

GKE Turns 10 Hackathon: Build Agentic AI Microservices

🚀 Join the GKE Turns 10 Hackathon to build next‑generation microservices enhanced with agentic AI. Google provides sample applications (Bank of Anthos or Online Boutique), example agents on GitHub, documentation, quickstarts and a webinar to help teams get started. Submissions must run on GKE and use Google AI models such as Gemini, with agents interacting via APIs rather than altering core application code. Participants may also use the Agent Development Kit (ADK), Model Context Protocol (MCP) and Agent2Agent (A2A) to extend functionality.
read more →

Container-Optimized Compute Delivers Fast Autopilot Scaling

🚀 GKE Autopilot now runs on a container-optimized compute platform that rethinks autoscaling to deliver near-real-time capacity. The platform uses dynamically resizable VMs and a pool of pre-provisioned compute so nodes can be resized or allocated without disrupting workloads. Customers on GKE Autopilot 1.32+ get faster pod scheduling, improved HPA responsiveness, and support for in-place pod resize out of the box. Google recommends the general purpose compute class for small, gradually scaling services.
read more →

EuroDaT and Google Cloud: Secure Financial Data Exchange

🔒 EuroDaT, a state-owned data trustee, built safeAML with major German banks to enable controlled, pseudonymous transaction matching while preserving GDPR compliance. The cloud-native service runs on Google Cloud and Google Kubernetes Engine, using infrastructure-as-code, isolated VPCs and auditable processing so EuroDaT never accesses personal-data content. By letting banks request targeted supplementary information, safeAML accelerates suspicious-activity checks, reduces false positives and lays groundwork for wider use in ESG and health data sharing.
read more →

EuroDaT and Google Cloud: Secure Financial Data Exchange

🔐 EuroDaT describes how its safeAML platform, built on Google Cloud and Google Kubernetes Engine, enables controlled, pseudonymous exchange of sensitive transaction data between banks. Acting as a neutral data trustee, EuroDaT never accesses personal content while automating secure, auditable workflows that replace error-prone phone calls. Pilots with German banks show faster, more accurate suspicion assessments and lower false positives.
read more →

What’s New in Google Cloud: Releases, Previews, and News

🔔 Google Cloud published a consolidated roundup of product releases and previews from early July through Aug 22, 2025, covering GA launches, public previews, and platform enhancements. Highlights include Earth Engine in BigQuery (GA), Vertex AI embedding scaling, new GKE features for NUMA alignment and swap, expanded NodeConfig controls, and Cloud Run with GPUs. Customers should review the linked documentation, request preview access via account teams where needed, and plan upgrades or migrations accordingly.
read more →