All news with #runai model streamer tag
Thu, December 4, 2025
NVIDIA Run:ai Model Streamer Adds Cloud Storage Support
🚀 The NVIDIA Run:ai Model Streamer now supports native Google Cloud Storage access, accelerating model load and inference startup for vLLM workloads on GKE. By streaming tensors directly from Cloud Storage into GPU memory and using distributed, NVLink-aware transfers, the streamer dramatically reduces cold-start latency and idle GPU time. Enabling it in vLLM is a single-flag change and it can leverage GKE Workload Identity for secure, keyless access.