All news with #runai model streamer tag

Thu, December 4, 2025

NVIDIA Run:ai Model Streamer Adds Cloud Storage Support

#Product Release #Google #GKE #NVIDIA #RunAI Model Streamer

🚀 The NVIDIA Run:ai Model Streamer now supports native Google Cloud Storage access, accelerating model load and inference startup for vLLM workloads on GKE. By streaming tensors directly from Cloud Storage into GPU memory and using distributed, NVLink-aware transfers, the streamer dramatically reduces cold-start latency and idle GPU time. Enabling it in vLLM is a single-flag change and it can leverage GKE Workload Identity for secure, keyless access.