All news with #intelligent routing tag
Wed, November 26, 2025
SageMaker HyperPod: Managed Tiered KV Cache Launch
⚡ Amazon SageMaker HyperPod now offers Managed Tiered KV Cache and Intelligent Routing to optimize LLM inference for long-context prompts and multi-turn conversations. The two-tier cache combines local CPU memory (L1) with disaggregated cluster storage (L2) — with AWS-native tiered storage recommended and Redis optional — to reuse computed key-value pairs and reduce recomputation. Intelligent Routing directs requests using prefix-aware, KV-aware, or round-robin strategies, while built-in observability integrates with Amazon Managed Grafana and deployment is enabled via InferenceEndpointConfig or SageMaker JumpStart.