All news with #gke tag

Tue, October 28, 2025

Giles AI on Google Cloud: Transforming Medical Research

#Google Cloud #Vertex AI #GKE #Cloud Run #Document AI

🚀 Giles AI migrated its healthcare-focused platform to Google Cloud to reduce latency, improve scalability, and accelerate developer velocity. Using Google Kubernetes Engine, Cloud Run, and Compute Engine, the company orchestrates complex clinical data flows and routes prompts through Vertex AI and Model Garden to remain model-agnostic. Data storage and extraction are handled with Cloud SQL, Cloud Storage, and Document AI, while Cloud Armor and Security Command Center bolster security and compliance. Early customer results include dramatic reductions in research time and improvements in response accuracy.

Tue, October 28, 2025

A4X Max, GKE Networking, and Vertex AI Training Now Shipping

#Product Release #NVIDIA #Google #Vertex AI #GKE

🚀 Google Cloud is expanding its NVIDIA collaboration with the new A4X Max instances powered by NVIDIA GB300 NVL72, delivering 72 GPUs with high‑bandwidth NVLink and shared memory for demanding multimodal reasoning. GKE now supports DRANET for topology‑aware RDMA scheduling and integrates NVIDIA NeMo Guardrails into GKE Inference Gateway, while Vertex AI Model Garden will host NVIDIA Nemotron models. Vertex AI Training adds NeMo and NeMo‑RL recipes and a managed Slurm environment to accelerate large‑scale training and deployment.

Tue, October 28, 2025

Google Cloud launches managed DRANET for GKE with A4X Max

#Google #GKE #Product Release #DRANET #A4X Max

🚀 Google Cloud is previewing managed DRANET on GKE, enabling Kubernetes to treat high-performance RDMA network interfaces as schedulable resources. The integration aligns NICs and GPUs by NUMA topology to reduce latency and increase throughput, while abstracting away operational complexity. It launches with the new A4X Max instances to deliver topology-aware networking for large multi-GPU AI workloads. Developers can request specific network interfaces in pod specs and rely on GKE to co-schedule NICs and accelerators, improving utilization and simplifying operations.

Mon, October 20, 2025

AI Hypercomputer Update: vLLM on TPUs and Tooling Advances

#Google #Cloud TPU #vLLM #GKE #NVIDIA

🔧 Google Cloud’s Q3 AI Hypercomputer update highlights inference improvements and expanded tooling to accelerate model serving and diagnostics. The release integrates vLLM with Cloud TPUs via the new tpu-inference plugin, unifying JAX and PyTorch runtimes and boosting TPU inference for models such as Gemma, Llama, and Qwen. Additional launches include improved XProf profiling and Cloud Diagnostics XProf, an AI inference recipe for NVIDIA Dynamo, NVIDIA NeMo RL recipes, and GA of the GKE Inference Gateway and Quickstart to help optimize latency and cost.

Mon, October 20, 2025

Design Patterns for Scalable AI Agents on Google Cloud

#Agentic AI #Vertex AI Agent Engine #Gemini #Google #BigQuery #Cloud Run #GKE

🤖 This post explains how System Integrator partners can build, scale, and manage enterprise-grade AI agents using Google Cloud technologies like Agent Engine, the Agent Development Kit (ADK), and Gemini Enterprise. It summarizes architecture patterns including runtime, memory, the Model Context Protocol (MCP), and the Agent-to-Agent (A2A) protocol, and contrasts managed Agent Engine with self-hosted options such as Cloud Run or GKE. Customer examples from Deloitte and Quantiphi illustrate supply chain and sales automation benefits. The guidance highlights security, observability, persistent memory, and model tuning for enterprise readiness.

Mon, October 20, 2025

Google Cloud G4 VMs: NVIDIA RTX PRO 6000 Blackwell GA

#Google Cloud #NVIDIA #Product Release #Vertex AI #GKE

🚀 The G4 VM is now generally available on Google Cloud, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs and offering up to 768 GB of GDDR7 memory per instance class. It targets latency-sensitive and regulated workloads for generative AI, real-time rendering, simulation, and virtual workstations. Features include FP4 precision support, Multi-Instance GPU (MIG) partitioning, an enhanced PCIe P2P interconnect for faster multi‑GPU All-Reduce, and an NVIDIA Omniverse VMI on Marketplace for industrial digital twins.

Fri, October 17, 2025

Use Gemini CLI to Deploy Cost-Effective LLM Workloads on GKE

#GKE #Google #Gemini CLI

🛠️ Google Cloud demonstrates how the Gemini CLI and GKE Inference Quickstart integrate via the Model Context Protocol (MCP) to streamline selecting, benchmarking, and deploying LLMs on GKE. The post outlines installation steps, example prompts to discover cost and performance trade-offs, and how manifests can be generated for target accelerators. This approach reduces manual tuning and provides data-driven recommendations to optimize cost-per-token while preserving performance.

Tue, October 14, 2025

Scaling Customer Experience with AI on Google Cloud

#Google #GKE #Cloud Run #Retrieval-Augmented Generation #LiveX AI #Vector Database

🤖 LiveX AI outlines a Google Cloud blueprint to scale conversational customer experiences across chat, voice, and avatar interfaces. The post details how Cloud Run hosts elastic front-end microservices while GKE provides GPU-backed AI inference, and how AgentFlow orchestrates conversational state, knowledge retrieval, and human escalation. Reported customer outcomes include a >90% self-service rate for Wyze and a 3× conversion uplift for Pictory. The design emphasizes cost efficiency, sub-second latency, multilingual support, and secure integrations with platforms such as Stripe, Zendesk, and Salesforce.

Tue, October 14, 2025

IBM Spectrum Symphony HostFactory Connectors for GCP

#Product Release #GKE #Compute Engine #IBM Spectrum Symphony

🚀 Google Cloud announces the general availability of open-source IBM Spectrum Symphony HostFactory connectors for Google Compute Engine and GKE. The connectors enable organizations to extend on‑premises Symphony clusters into Google Cloud or deploy fully cloud-native clusters with automatic provisioning and decommissioning to match workload demand. Partner-built by Accenture and validated by Aneo, the connectors support enterprise features such as Spot and on‑demand VMs, GPUs, Local SSD, Confidential VMs, Pub/Sub event-driven management, Kubernetes CRDs, and integration with managed instance group (MIG) APIs for large-scale HPC operations.

Mon, October 6, 2025

Cost-Saving Strategies When Migrating to Google Cloud

#Google Cloud #Compute Engine #GKE #Cloud Run #Spot VMs

💡 Google Cloud presents practical strategies to lower Compute Engine and block storage costs during migration and modernization. The article recommends adopting latest-generation VMs and specialized instance families, right-sizing or using custom machine types, and tuning storage with Hyperdisk and storage pools to align capacity and performance. It also emphasizes financial levers—committed use discounts, Spot VMs, autoscaling, and recommender-driven actions—to reduce spend while preserving performance.

Mon, September 29, 2025

Adopt New VM Series with GKE Compute Classes, Flex CUDs

#Google #GKE #Compute Engine #Compute Flexible CUDs #Committed Use Discounts

⚙️ Google Cloud outlines a practical approach to adopt Gen4 VM families by pairing GKE compute classes with Compute Flexible CUDs, enabling prioritized machine-family fallbacks and spend-based discounts. Compute classes let teams define prioritized machine families (for example, N4 then N2) so the cluster autoscaler can provision preferred hardware while preserving availability. Flex CUDs apply discounts across eligible VM families and follow consumption, protecting committed discounts when fallbacks occur. Together these features reduce migration risk and simplify platform operations.

Wed, September 24, 2025

GKE Autopilot Features Now Available to Qualified Clusters

#Google Cloud #GKE #Product Release #Compute Classes

🚀 Google Cloud has extended core Autopilot capabilities to qualified Standard GKE clusters, enabling access to the new container-optimized compute platform via built-in compute classes. Available initially to clusters in the Rapid release channel running 1.33.1-gke.1107000 or later, these features include the autopilot and autopilot-spot compute classes and a provisioning mode that supports gradual adoption. Benefits include rapid horizontal and vertical scaling, pay-for-request billing, efficient bin-packing, and support for GPUs and TPUs for AI workloads.

Fri, September 19, 2025

GCE and GKE Security Dashboards Powered by SCC Now

#Google #GCE #GKE #Security Command Center

🔒 Google has added integrated security dashboards to GCE and GKE consoles, powered by Security Command Center. The dashboards surface top security findings, vulnerability trends, CVE prioritization, and container/workload misconfigurations informed by Google Threat Intelligence and Mandiant analysis. Teams can remediate misconfigurations, prioritize patches, and monitor threats directly in their compute and cluster consoles. Full vulnerability and threat widgets require upgrading to SCC Premium (30‑day trial available).

Fri, September 19, 2025

GKE Managed Lustre CSI Driver for AI and HPC Workloads

#GKE #Managed Lustre #Google #GCP Cloud Storage

🚀 Managed Lustre on GKE is a managed parallel file system with a CSI driver that brings low-latency, high-throughput POSIX storage to Kubernetes for demanding AI and HPC workloads. It is recommended for training, checkpointing, and small-file patterns where GPUs/TPUs must stay utilized, while Cloud Storage is an alternative for large, higher-latency files. The article presents five operational best practices—data locality, tiering, networking, provisioning, and using Kubernetes Jobs with a shared PVC—to maximize performance and control costs.

Wed, September 17, 2025

GKE Network Interface: From kubenet to the AI backbone

#GKE #eBPF #Cilium #DRANET

📡 Over the past decade, Google Cloud evolved GKE pod networking from basic kubenet and route-based clusters to VPC-native alias IPs and the eBPF-powered Cilium Dataplane V2, improving performance, scalability, and observability. The platform now supports extreme-scale AI workloads with multi-NIC, terabit throughput, and persistent IPs for stateful functions. Looking forward, Google is exploring the Kubernetes Network Driver and the DRANET reference to expose node-level network resources via Dynamic Resource Allocation.

Tue, September 16, 2025

Google Cloud and SAP: Unified Data, AI Agents, and HANA

#Google Cloud #SAP #BigQuery #Vertex AI #GKE

🚀 Google Cloud and SAP announced tighter integration to unify enterprise data and accelerate intelligent automation. SAP Business Data Cloud now connects to BigQuery via Datasphere, enabling bidirectional replication and AI-ready analytics. Procurement is simplified on the Google Cloud Marketplace with SAP BTP. New agent tooling—Agentspace, the Agent Development Kit, A2A and MCP standards—and expanded M4 memory-optimized VMs certified for SAP HANA aim to speed deployments, improve data consistency, and enable autonomous process automation.

Mon, September 15, 2025

Google releases XProf and Cloud Diagnostics XProf tools

#Google #OpenXLA #XProf #TensorBoard #GKE #GCP Cloud Storage

🔧 Google has open-sourced XProf, an upgraded ML profiler, and published the Cloud Diagnostics XProf library to simplify profiling and optimizing models on xPUs. The release brings unified XLA-based profiling across JAX, PyTorch/XLA and TensorFlow/Keras, and supports programmatic and on-demand trace capture. The Cloud Diagnostics library packages dependencies, stores profiles in Google Cloud Storage for retention, provisions TensorBoard on VMs or GKE for faster loading, and produces shareable links for collaborative analysis with tunable machine types for performance.

Wed, September 10, 2025

Disaggregated AI Inference with NVIDIA Dynamo on GKE

#Product Release #NVIDIA #GKE #AI Hypercomputer

⚡ This post announces a reproducible recipe to deploy NVIDIA Dynamo for disaggregated LLM inference on Google Cloud’s AI Hypercomputer using Google Kubernetes Engine, vLLM, and A3 Ultra (H200) GPUs. The recipe separates prefill and decode phases across dedicated GPU pools to reduce contention and lower latency. It includes single-node and multi-node examples and step-by-step deployment actions. The repository provides configuration guidance and future plans for broader GPU and engine support.

Fri, September 5, 2025

GKE Turns 10 Hackathon: Build Agentic AI Microservices

#GKE #Google #Agentic AI #Conference

🚀 Join the GKE Turns 10 Hackathon to build next‑generation microservices enhanced with agentic AI. Google provides sample applications (Bank of Anthos or Online Boutique), example agents on GitHub, documentation, quickstarts and a webinar to help teams get started. Submissions must run on GKE and use Google AI models such as Gemini, with agents interacting via APIs rather than altering core application code. Participants may also use the Agent Development Kit (ADK), Model Context Protocol (MCP) and Agent2Agent (A2A) to extend functionality.

Tue, September 2, 2025

Agent Development Kit Hackathon: Winners and Highlights

#Agentic AI #Conference #GKE #Google Cloud #Vertex AI

🚀 The Agent Development Kit (ADK) Hackathon concluded with more than 10,400 participants from 62 countries, 477 submitted projects, and 1,500+ agents built. The competition emphasized multi-agent orchestration for automation, data analysis, customer service, and content generation, awarding SalesShortcut the Grand Prize. Regional winners included Energy Agent AI, Edu.AI, GreenOps, and Nexora-AI, and organizers pointed participants to ADK documentation and developer forums while announcing an upcoming GKE hackathon with over $50,000 in prizes.