AWS adds NIXL with EFA to accelerate LLM inference at scale
⚡ AWS now supports NVIDIA Inference Xfer Library (NIXL) with Elastic Fabric Adapter (EFA) on all EFA-enabled EC2 instances and regions. This integration accelerates disaggregated LLM inference by increasing KV-cache throughput, lowering inter-token latency, and optimizing KV-cache memory use between prefill and decode nodes. NIXL interoperates with frameworks such as NVIDIA Dynamo, SGLang, and vLLM. Supported versions are NIXL 1.0.0+ and EFA installer 1.47.0+, available at no extra cost.
