All news with #nvidia tag
Mon, October 20, 2025
AI Hypercomputer Update: vLLM on TPUs and Tooling Advances
🔧 Google Cloud’s Q3 AI Hypercomputer update highlights inference improvements and expanded tooling to accelerate model serving and diagnostics. The release integrates vLLM with Cloud TPUs via the new tpu-inference plugin, unifying JAX and PyTorch runtimes and boosting TPU inference for models such as Gemma, Llama, and Qwen. Additional launches include improved XProf profiling and Cloud Diagnostics XProf, an AI inference recipe for NVIDIA Dynamo, NVIDIA NeMo RL recipes, and GA of the GKE Inference Gateway and Quickstart to help optimize latency and cost.
Thu, October 16, 2025
ThreatsDay Bulletin: $15B Crypto Seizure, Weekly Risks
🔔 This week’s ThreatsDay bulletin highlights a historic U.S. DOJ seizure of roughly $15 billion in cryptocurrency linked to an alleged transnational fraud network, alongside active commodity malware, phishing-as-a-service, and novel abuses of legitimate tools. Notable incidents include the Brazil-distributed Maverick banking trojan spread via a WhatsApp worm, consumer-grade interception of geostationary satellite traffic, and UEFI BombShell flaws enabling bootkit persistence. Priorities: identity resilience, patching, and monitoring of remote-access and cloud services.
Wed, October 15, 2025
Google Cloud and NVIDIA Power AI Innovation Week in D.C.
🤝 At the end of October in Washington, D.C., Google Cloud and NVIDIA will lead a week of events highlighting advances in AI, high-performance computing, and secure mission deployments. NVIDIA GTC DC (Oct. 27–29) features keynotes, demos, and hands-on sessions showcasing next-generation models and infrastructure. The Google Public Sector Summit (Oct. 29) convenes government leaders to explore practical uses of technologies like Gemini for Government and discuss secure, scalable AI adoption for mission impact.
Thu, October 9, 2025
Microsoft Azure Debuts Large-Scale NVIDIA GB300 Cluster
🚀 Microsoft Azure announced the first production-scale cluster using more than 4,600 NVIDIA GB300 NVL72 (Blackwell Ultra) GPUs, co-engineered with NVIDIA to support OpenAI and other frontier AI workloads. The new ND GB300 v6 VMs are optimized for reasoning models, agentic systems, and multimodal generative AI, delivered on rack-scale systems with 72 GPUs per rack and 36 NVIDIA Grace CPUs. Microsoft says this infrastructure will shorten training from months to weeks and will scale to hundreds of thousands of Blackwell Ultra GPUs globally.
Mon, October 6, 2025
Zeroday Cloud contest: $4.5M bounties for cloud tools
🔐 Zeroday Cloud is a new hacking competition focused on open-source cloud and AI tools, offering a $4.5 million bug bounty pool. Hosted by Wiz Research with Google Cloud, AWS, and Microsoft, it takes place December 10–11 at Black Hat Europe in London. The contest features six categories covering AI, Kubernetes, containers, web servers, databases, and DevOps, with bounties ranging from $10,000 to $300,000. Participants must deliver complete compromises and register via HackerOne.
Wed, October 1, 2025
Cisco Talos Discloses Multiple Nvidia and Adobe Flaws
⚠ Cisco Talos disclosed five vulnerabilities in NVIDIA's CUDA Toolkit components and one use-after-free flaw in Adobe Acrobat Reader. The Nvidia issues affect tools like cuobjdump (12.8.55) and nvdisasm (12.8.90), where specially crafted fatbin or ELF files can trigger out-of-bounds writes, heap overflows, and potential arbitrary code execution. The Adobe bug (2025.001.20531) involves malicious JavaScript in PDFs that can reuse freed objects, leading to memory corruption and possible remote code execution if a user opens a crafted document.
Thu, September 18, 2025
Inside Fairwater: Microsoft's New Frontier AI Datacenter
🚀 Microsoft unveiled Fairwater, a purpose-built AI datacenter in Wisconsin and sister sites in Norway and the UK, designed to operate as a single, global-scale supercomputer. The facility deploys interconnected racks of NVIDIA GB200 servers (72 GPUs per rack) and claims 10× the performance of the world’s fastest supercomputer. It combines closed-loop liquid cooling, exabyte-scale storage and an AI WAN to enable distributed training and large-scale inference across Azure.
Wed, September 17, 2025
CrowdStrike Secures AI Across the Enterprise with Partners
🔒 CrowdStrike describes how the Falcon platform delivers unified visibility and lifecycle defense across the full AI stack, from GPUs and training data to inference pipelines and SaaS agents. The post highlights integrations with NVIDIA, AWS, Intel, Dell, Meta, and Salesforce to extend protection into infrastructure, data, models, and applications. It also introduces agentic defense via Charlotte AI for autonomous triage and rapid response, and emphasizes governance controls to prevent data leaks and adversarial manipulation.
Fri, September 12, 2025
Amazon SageMaker Adds EC2 P6-B200 Notebook Instances
🚀 Amazon Web Services announced general availability of EC2 P6-B200 instances for SageMaker notebooks. These instances include eight NVIDIA Blackwell GPUs with 1,440 GB of high-bandwidth GPU memory and 5th Gen Intel Xeon processors, offering up to 2x the training performance versus P5en. They enable interactive development and fine-tuning of large foundation models in JupyterLab and CodeEditor, and are available in US East (Ohio) and US West (Oregon).
Wed, September 10, 2025
Disaggregated AI Inference with NVIDIA Dynamo on GKE
⚡ This post announces a reproducible recipe to deploy NVIDIA Dynamo for disaggregated LLM inference on Google Cloud’s AI Hypercomputer using Google Kubernetes Engine, vLLM, and A3 Ultra (H200) GPUs. The recipe separates prefill and decode phases across dedicated GPU pools to reduce contention and lower latency. It includes single-node and multi-node examples and step-by-step deployment actions. The repository provides configuration guidance and future plans for broader GPU and engine support.
Thu, September 4, 2025
Baseten: improved cost-performance for AI inference
🚀 Baseten reports major cost-performance gains for AI inference by combining Google Cloud A4 VMs powered by NVIDIA Blackwell GPUs with Google Cloud’s Dynamic Workload Scheduler. The company cites 225% better cost-performance for high-throughput inference and 25% improvement for latency-sensitive workloads. Baseten pairs cutting-edge hardware with an open, optimized software stack — including TensorRT-LLM, NVIDIA Dynamo, and vLLM — and multi-cloud resilience to deliver scalable, production-ready inference.
Tue, September 2, 2025
AWS Split Cost Allocation Adds GPU and Accelerator Cost Tracking
🔍 Split Cost Allocation Data now supports accelerator-based workloads running in Amazon Elastic Kubernetes Service (EKS), allowing customers to track costs for Trainium, Inferentia, NVIDIA and AMD GPUs alongside CPU and memory. Cost details are included in the AWS Cost and Usage Report (including CUR 2.0) and can be visualized using the Containers Cost Allocation dashboard in Amazon QuickSight or queried with Amazon Athena. New customers can enable the feature in the Billing and Cost Management console; it is automatically enabled for existing Split Cost Allocation Data customers.
Fri, August 29, 2025
Google Cloud Expands Confidential Computing with Intel TDX
🔒 Google Cloud has expanded its Intel TDX-based Confidential Computing portfolio, now offering Confidential GKE Nodes, Confidential Space, and Confidential GPUs alongside broader regional availability. Creating an Intel TDX Confidential VM is exposed directly in the GCE Create an instance flow under the Security tab, with no code changes required. The C3 machine series supports Intel TDX across additional regions and zones, and NVIDIA H100 GPUs on the A3 series enable confidential AI by combining Intel CPU protection with NVIDIA Confidential Computing on the GPU.
Thu, August 28, 2025
Gemini Available On-Premises with Google Distributed Cloud
🚀 Gemini on Google Distributed Cloud (GDC) is now generally available for customers, bringing Google’s advanced Gemini models on‑premises with GA for air‑gapped deployments and a connected preview. The solution provides managed Gemini endpoints with zero‑touch updates, automatic load balancing and autoscaling, and integrates with Vertex AI and preview agents. It pairs Gemini 2.5 Flash and Pro with NVIDIA Hopper and Blackwell accelerators and includes audit logging, access controls, and support for Confidential Computing (Intel TDX and NVIDIA) to meet strict data residency, sovereignty, and compliance requirements.
Wed, August 27, 2025
How Cloudflare Runs More AI Models on Fewer GPUs with Omni
🤖 Cloudflare explains how Omni, an internal platform, consolidates many AI models onto fewer GPUs using lightweight process isolation, per-model Python virtual environments, and controlled GPU over-commitment. Omni’s scheduler spawns and manages model processes, isolates file systems with a FUSE-backed /proc/meminfo, and intercepts CUDA allocations to safely over-commit GPU RAM. The result is improved availability, lower latency, and reduced idle GPU waste.
Wed, August 27, 2025
Cloudflare's Edge-Optimized LLM Inference Engine at Scale
⚡ Infire is Cloudflare’s new, Rust-based LLM inference engine built to run large models efficiently across a globally distributed, low-latency network. It replaces Python-based vLLM in scenarios where sandboxing and dynamic co-hosting caused high CPU overhead and reduced GPU utilization, using JIT-compiled CUDA kernels, paged KV caching, and fine-grained CUDA graphs to cut startup and runtime cost. Early benchmarks show up to 7% lower latency on H100 NVL hardware, substantially higher GPU utilization, and far lower CPU load while powering models such as Llama 3.1 8B in Workers AI.
Tue, August 26, 2025
Microsoft Azure and NVIDIA Accelerate Scientific AI
🔬 This blog highlights how Microsoft Azure and NVIDIA combine cloud infrastructure and GPU-accelerated AI tooling to speed scientific discovery and commercial deployment. It profiles three startups—Pangaea Data, Basecamp Research, and Global Objects—demonstrating applications from clinical decision support to large-scale protein databases and photorealistic digital twins. The piece emphasizes measurable outcomes, compliance, and the importance of scalable compute and optimized AI frameworks for real-world impact.
Mon, August 25, 2025
Amazon EC2 G6 Instances with NVIDIA L4 Now in UAE Region
🚀 Amazon has launched EC2 G6 instances powered by NVIDIA L4 GPUs in the Middle East (UAE) Region, expanding cloud GPU capacity for graphics and ML workloads. G6 instances offer up to 8 L4 GPUs with 24 GB per GPU, third-generation AMD EPYC processors, up to 192 vCPUs, 100 Gbps networking, and up to 7.52 TB local NVMe storage. They are available via On-Demand, Reserved, Spot, and Savings Plans and can be managed through the AWS Console, CLI, and SDKs.
Mon, August 25, 2025
vLLM Performance Tuning for xPU Inference Configs Guide
⚙️ This guide from Google Cloud authors Eric Hanley and Brittany Rockwell explains how to tune vLLM deployments for xPU inference, covering accelerator selection, memory sizing, configuration, and benchmarking. It shows how to gather workload parameters, estimate HBM/VRAM needs (example: gemma-3-27b-it ≈57 GB), and run vLLM’s auto_tune to find optimal gpu_memory_utilization and throughput. The post compares GPU and TPU options and includes practical troubleshooting tips, cost analyses, and resources to reproduce benchmarks and HBM calculations.