< ciso
brief />
Tag Banner

All news with #nvidia tag

86 articles · page 5 of 5

Disaggregated AI Inference with NVIDIA Dynamo on GKE

⚡ This post announces a reproducible recipe to deploy NVIDIA Dynamo for disaggregated LLM inference on Google Cloud’s AI Hypercomputer using Google Kubernetes Engine, vLLM, and A3 Ultra (H200) GPUs. The recipe separates prefill and decode phases across dedicated GPU pools to reduce contention and lower latency. It includes single-node and multi-node examples and step-by-step deployment actions. The repository provides configuration guidance and future plans for broader GPU and engine support.
read more →

Reviewing AI Data Center Policies to Mitigate Risks

🔒 Investment in AI data centers is accelerating globally, creating not only rising energy demand and emissions but also an expanded surface of cyber threats. AI facilities rely on GPUs, ASICs and FPGAs, which introduce side-channel, memory-level and GPU-resident malware risks that differ from traditional CPU-focused threats. Organizations should require operators to implement supply-chain vetting, physical shielding (for example, Faraday cages), continuous model auditing and stronger personnel controls to reduce model exfiltration, poisoning and foreign infiltration.
read more →

Baseten: improved cost-performance for AI inference

🚀 Baseten reports major cost-performance gains for AI inference by combining Google Cloud A4 VMs powered by NVIDIA Blackwell GPUs with Google Cloud’s Dynamic Workload Scheduler. The company cites 225% better cost-performance for high-throughput inference and 25% improvement for latency-sensitive workloads. Baseten pairs cutting-edge hardware with an open, optimized software stack — including TensorRT-LLM, NVIDIA Dynamo, and vLLM — and multi-cloud resilience to deliver scalable, production-ready inference.
read more →

AWS SageMaker Adds P5.4xlarge with NVIDIA H100 GPU

🚀 Amazon SageMaker Training and Processing Jobs now supports the new EC2 P5 instance size with a single NVIDIA H100 GPU, offering the P5.4xlarge configuration for cost‑effective ML and HPC workloads. The instance enables fine-grained scaling so customers can begin with smaller configurations and expand incrementally, improving cost management and infrastructure flexibility. P5.4xlarge is available via SageMaker Flexible Training Plans and in select regions through On‑Demand and Spot.
read more →

Microsoft Azure and NVIDIA Accelerate Scientific AI

🔬 This blog highlights how Microsoft Azure and NVIDIA combine cloud infrastructure and GPU-accelerated AI tooling to speed scientific discovery and commercial deployment. It profiles three startups—Pangaea Data, Basecamp Research, and Global Objects—demonstrating applications from clinical decision support to large-scale protein databases and photorealistic digital twins. The piece emphasizes measurable outcomes, compliance, and the importance of scalable compute and optimized AI frameworks for real-world impact.
read more →

Amazon EC2 G6 Instances with NVIDIA L4 Now in UAE Region

🚀 Amazon has launched EC2 G6 instances powered by NVIDIA L4 GPUs in the Middle East (UAE) Region, expanding cloud GPU capacity for graphics and ML workloads. G6 instances offer up to 8 L4 GPUs with 24 GB per GPU, third-generation AMD EPYC processors, up to 192 vCPUs, 100 Gbps networking, and up to 7.52 TB local NVMe storage. They are available via On-Demand, Reserved, Spot, and Savings Plans and can be managed through the AWS Console, CLI, and SDKs.
read more →