All news with #managed lustre tag
Fri, October 31, 2025
Choosing Google Cloud Managed Lustre for External KV Cache
🚀 This post explains how an external KV Cache backed by Google Cloud Managed Lustre can accelerate transformer inference and lower costs by offloading expensive prefill compute to I/O. In experiments with a 50K token context and ~75% cache-hit, Managed Lustre increased inference throughput by 75% and cut mean time-to-first-token by 44%. The analysis projects a 35% TCO reduction and up to ~43% fewer GPUs for the same workload, and the article summarizes practical steps: provision Managed Lustre in the same zone, deploy an inference server that supports external caching (for example vLLM), enable o_direct, and tune I/O parallelism.
Fri, September 19, 2025
GKE Managed Lustre CSI Driver for AI and HPC Workloads
🚀 Managed Lustre on GKE is a managed parallel file system with a CSI driver that brings low-latency, high-throughput POSIX storage to Kubernetes for demanding AI and HPC workloads. It is recommended for training, checkpointing, and small-file patterns where GPUs/TPUs must stay utilized, while Cloud Storage is an alternative for large, higher-latency files. The article presents five operational best practices—data locality, tiering, networking, provisioning, and using Kubernetes Jobs with a shared PVC—to maximize performance and control costs.