All news with #lmcache tag

Fri, November 7, 2025

Tiered KV Cache Boosts LLM Performance on GKE with HBM

#GKE #NVIDIA #LMCache #vLLM #Retrieval-Augmented Generation

🚀 LMCache implements a node-local, tiered KV Cache on GKE to extend the GPU HBM-backed Key-Value store into CPU RAM and local SSD, increasing effective cache capacity and hit ratio. In benchmarks using Llama-3.3-70B-Instruct on an A3 mega instance (8×nvidia-h100-mega-80gb), configurations that added RAM and SSD reduced Time-to-First-Token and materially increased token throughput for long system prompts. The results demonstrate a practical approach to scale context windows while balancing cost and latency on GKE.