All news with #run:ai tag

Wed, September 10, 2025

GKE Inference Gateway and Quickstart Achieve GA Status

#Google Cloud #GKE Inference Gateway #GKE Inference Quickstart #vLLM #TPU #Run:ai #Anywhere Cache

🚀 GKE Inference Gateway and GKE Inference Quickstart are now generally available, bringing production-ready inferencing features built on AI Hypercomputer. New capabilities include prefix-aware load balancing, disaggregated serving, vLLM support on TPUs and Ironwood TPUs, and model streaming with Anywhere Cache to cut model load times. These features target faster time-to-first-token and time-per-output-token, higher throughput, and lower inference costs, while Quickstart offers data-driven accelerator and configuration recommendations.