All news with #inference tag
Thu, September 4, 2025
Baseten: improved cost-performance for AI inference
🚀 Baseten reports major cost-performance gains for AI inference by combining Google Cloud A4 VMs powered by NVIDIA Blackwell GPUs with Google Cloud’s Dynamic Workload Scheduler. The company cites 225% better cost-performance for high-throughput inference and 25% improvement for latency-sensitive workloads. Baseten pairs cutting-edge hardware with an open, optimized software stack — including TensorRT-LLM, NVIDIA Dynamo, and vLLM — and multi-cloud resilience to deliver scalable, production-ready inference.