Tag Banner

All news with #inferentia tag

Tue, September 2, 2025

AWS Split Cost Allocation Adds GPU and Accelerator Cost Tracking

🔍 Split Cost Allocation Data now supports accelerator-based workloads running in Amazon Elastic Kubernetes Service (EKS), allowing customers to track costs for Trainium, Inferentia, NVIDIA and AMD GPUs alongside CPU and memory. Cost details are included in the AWS Cost and Usage Report (including CUR 2.0) and can be visualized using the Containers Cost Allocation dashboard in Amazon QuickSight or queried with Amazon Athena. New customers can enable the feature in the Billing and Cost Management console; it is automatically enabled for existing Split Cost Allocation Data customers.

read more →

Thu, August 21, 2025

AWS Neuron SDK 2.25: Inference and Monitoring Enhancements

🚀 AWS has released Neuron SDK 2.25.0, now generally available for Inferentia and Trainium instances, adding context and data parallelism support plus chunked attention to accelerate long-sequence inference. The update enhances neuron-ls and neuron-monitor APIs to show node affinities and device utilization, and introduces automatic aliasing (Beta) and disaggregated serving improvements (Beta). Upgraded AMIs and Deep Learning Containers are provided for inference and training.

read more →