All news with #nvidia tag

Mon, October 20, 2025

AI Hypercomputer Update: vLLM on TPUs and Tooling Advances

🔧 Google Cloud’s Q3 AI Hypercomputer update highlights inference improvements and expanded tooling to accelerate model serving and diagnostics. The release integrates vLLM with Cloud TPUs via the new tpu-inference plugin, unifying JAX and PyTorch runtimes and boosting TPU inference for models such as Gemma, Llama, and Qwen. Additional launches include improved XProf profiling and Cloud Diagnostics XProf, an AI inference recipe for NVIDIA Dynamo, NVIDIA NeMo RL recipes, and GA of the GKE Inference Gateway and Quickstart to help optimize latency and cost.

Thu, October 16, 2025

ThreatsDay Bulletin: $15B Crypto Seizure, Weekly Risks

#Law Enforcement Action #Breach #Data Leak #Security Advisory #Patch #Threat Report #NVIDIA #Google #Chainalysis

🔔 This week’s ThreatsDay bulletin highlights a historic U.S. DOJ seizure of roughly $15 billion in cryptocurrency linked to an alleged transnational fraud network, alongside active commodity malware, phishing-as-a-service, and novel abuses of legitimate tools. Notable incidents include the Brazil-distributed Maverick banking trojan spread via a WhatsApp worm, consumer-grade interception of geostationary satellite traffic, and UEFI BombShell flaws enabling bootkit persistence. Priorities: identity resilience, patching, and monitoring of remote-access and cloud services.

Wed, October 15, 2025

Google Cloud and NVIDIA Power AI Innovation Week in D.C.

#Google #NVIDIA #Agentic AI #Conference

🤝 At the end of October in Washington, D.C., Google Cloud and NVIDIA will lead a week of events highlighting advances in AI, high-performance computing, and secure mission deployments. NVIDIA GTC DC (Oct. 27–29) features keynotes, demos, and hands-on sessions showcasing next-generation models and infrastructure. The Google Public Sector Summit (Oct. 29) convenes government leaders to explore practical uses of technologies like Gemini for Government and discuss secure, scalable AI adoption for mission impact.

Thu, October 9, 2025

Microsoft Azure Debuts Large-Scale NVIDIA GB300 Cluster

#Microsoft #Azure #Azure OpenAI #OpenAI #NVIDIA #GB300 NVL72 #Agentic AI

🚀 Microsoft Azure announced the first production-scale cluster using more than 4,600 NVIDIA GB300 NVL72 (Blackwell Ultra) GPUs, co-engineered with NVIDIA to support OpenAI and other frontier AI workloads. The new ND GB300 v6 VMs are optimized for reasoning models, agentic systems, and multimodal generative AI, delivered on rack-scale systems with 72 GPUs per rack and 36 NVIDIA Grace CPUs. Microsoft says this infrastructure will shorten training from months to weeks and will scale to hundreds of thousands of Blackwell Ultra GPUs globally.

Mon, October 6, 2025

Zeroday Cloud contest: $4.5M bounties for cloud tools

#Bug Bounty #AI Security #Zero-Day #Kubernetes API Server #Docker #NVIDIA #RCE

🔐 Zeroday Cloud is a new hacking competition focused on open-source cloud and AI tools, offering a $4.5 million bug bounty pool. Hosted by Wiz Research with Google Cloud, AWS, and Microsoft, it takes place December 10–11 at Black Hat Europe in London. The contest features six categories covering AI, Kubernetes, containers, web servers, databases, and DevOps, with bounties ranging from $10,000 to $300,000. Participants must deliver complete compromises and register via HackerOne.

Wed, October 1, 2025

Cisco Talos Discloses Multiple Nvidia and Adobe Flaws

#NVIDIA #Adobe #Security Advisory #Patch #RCE #Use-After-Free

⚠ Cisco Talos disclosed five vulnerabilities in NVIDIA's CUDA Toolkit components and one use-after-free flaw in Adobe Acrobat Reader. The Nvidia issues affect tools like cuobjdump (12.8.55) and nvdisasm (12.8.90), where specially crafted fatbin or ELF files can trigger out-of-bounds writes, heap overflows, and potential arbitrary code execution. The Adobe bug (2025.001.20531) involves malicious JavaScript in PDFs that can reuse freed objects, leading to memory corruption and possible remote code execution if a user opens a crafted document.

Thu, September 18, 2025

Inside Fairwater: Microsoft's New Frontier AI Datacenter

#Microsoft #Azure OpenAI #Product Release #NVIDIA #Azure Storage

🚀 Microsoft unveiled Fairwater, a purpose-built AI datacenter in Wisconsin and sister sites in Norway and the UK, designed to operate as a single, global-scale supercomputer. The facility deploys interconnected racks of NVIDIA GB200 servers (72 GPUs per rack) and claims 10× the performance of the world’s fastest supercomputer. It combines closed-loop liquid cooling, exabyte-scale storage and an AI WAN to enable distributed training and large-scale inference across Azure.

Wed, September 17, 2025

CrowdStrike Secures AI Across the Enterprise with Partners

#CrowdStrike #AI Security #NVIDIA #AWS #Intel #Dell #Salesforce #Meta #Agentic AI #AI Governance

🔒 CrowdStrike describes how the Falcon platform delivers unified visibility and lifecycle defense across the full AI stack, from GPUs and training data to inference pipelines and SaaS agents. The post highlights integrations with NVIDIA, AWS, Intel, Dell, Meta, and Salesforce to extend protection into infrastructure, data, models, and applications. It also introduces agentic defense via Charlotte AI for autonomous triage and rapid response, and emphasizes governance controls to prevent data leaks and adversarial manipulation.

Fri, September 12, 2025

Amazon SageMaker Adds EC2 P6-B200 Notebook Instances

#AWS #AWS EC2 #SageMaker #Product Release #NVIDIA

🚀 Amazon Web Services announced general availability of EC2 P6-B200 instances for SageMaker notebooks. These instances include eight NVIDIA Blackwell GPUs with 1,440 GB of high-bandwidth GPU memory and 5th Gen Intel Xeon processors, offering up to 2x the training performance versus P5en. They enable interactive development and fine-tuning of large foundation models in JupyterLab and CodeEditor, and are available in US East (Ohio) and US West (Oregon).

Wed, September 10, 2025

Disaggregated AI Inference with NVIDIA Dynamo on GKE

#Product Release #NVIDIA #GKE #AI Hypercomputer

⚡ This post announces a reproducible recipe to deploy NVIDIA Dynamo for disaggregated LLM inference on Google Cloud’s AI Hypercomputer using Google Kubernetes Engine, vLLM, and A3 Ultra (H200) GPUs. The recipe separates prefill and decode phases across dedicated GPU pools to reduce contention and lower latency. It includes single-node and multi-node examples and step-by-step deployment actions. The repository provides configuration guidance and future plans for broader GPU and engine support.

Thu, September 4, 2025

Baseten: improved cost-performance for AI inference

#Google Cloud #NVIDIA #Baseten #AI Hypercomputer #Agentic AI #Inference

🚀 Baseten reports major cost-performance gains for AI inference by combining Google Cloud A4 VMs powered by NVIDIA Blackwell GPUs with Google Cloud’s Dynamic Workload Scheduler. The company cites 225% better cost-performance for high-throughput inference and 25% improvement for latency-sensitive workloads. Baseten pairs cutting-edge hardware with an open, optimized software stack — including TensorRT-LLM, NVIDIA Dynamo, and vLLM — and multi-cloud resilience to deliver scalable, production-ready inference.

Tue, September 2, 2025

AWS Split Cost Allocation Adds GPU and Accelerator Cost Tracking

#AMD #AWS #AWS EC2 #AWS EKS #Inferentia #NVIDIA #Product Release #Trainium

🔍 Split Cost Allocation Data now supports accelerator-based workloads running in Amazon Elastic Kubernetes Service (EKS), allowing customers to track costs for Trainium, Inferentia, NVIDIA and AMD GPUs alongside CPU and memory. Cost details are included in the AWS Cost and Usage Report (including CUR 2.0) and can be visualized using the Containers Cost Allocation dashboard in Amazon QuickSight or queried with Amazon Athena. New customers can enable the feature in the Billing and Cost Management console; it is automatically enabled for existing Split Cost Allocation Data customers.

Fri, August 29, 2025

Google Cloud Expands Confidential Computing with Intel TDX

#Confidential Computing #Confidential GKE Nodes #Confidential VM #Google #Intel #NVIDIA

🔒 Google Cloud has expanded its Intel TDX-based Confidential Computing portfolio, now offering Confidential GKE Nodes, Confidential Space, and Confidential GPUs alongside broader regional availability. Creating an Intel TDX Confidential VM is exposed directly in the GCE Create an instance flow under the Security tab, with no code changes required. The C3 machine series supports Intel TDX across additional regions and zones, and NVIDIA H100 GPUs on the A3 series enable confidential AI by combining Intel CPU protection with NVIDIA Confidential Computing on the GPU.

Thu, August 28, 2025

Gemini Available On-Premises with Google Distributed Cloud

#Closed-Weight Models #Data Residency #Data Sovereignty #Google #Inference Security #Intel #NVIDIA #Open-Weight Models #Product Release #Vertex AI

🚀 Gemini on Google Distributed Cloud (GDC) is now generally available for customers, bringing Google’s advanced Gemini models on‑premises with GA for air‑gapped deployments and a connected preview. The solution provides managed Gemini endpoints with zero‑touch updates, automatic load balancing and autoscaling, and integrates with Vertex AI and preview agents. It pairs Gemini 2.5 Flash and Pro with NVIDIA Hopper and Blackwell accelerators and includes audit logging, access controls, and support for Confidential Computing (Intel TDX and NVIDIA) to meet strict data residency, sovereignty, and compliance requirements.

Wed, August 27, 2025

How Cloudflare Runs More AI Models on Fewer GPUs with Omni

#Cloudflare #Cloudflare Workers #Function Calling #Inference Security #Model Isolation #Model Routing #NVIDIA

🤖 Cloudflare explains how Omni, an internal platform, consolidates many AI models onto fewer GPUs using lightweight process isolation, per-model Python virtual environments, and controlled GPU over-commitment. Omni’s scheduler spawns and manages model processes, isolates file systems with a FUSE-backed /proc/meminfo, and intercepts CUDA allocations to safely over-commit GPU RAM. The result is improved availability, lower latency, and reduced idle GPU waste.

Wed, August 27, 2025

Cloudflare's Edge-Optimized LLM Inference Engine at Scale

#Cloudflare #Cloudflare Workers #Hugging Face #Inference Security #Model Isolation #Model Routing #NVIDIA #Open-Weight Models

⚡ Infire is Cloudflare’s new, Rust-based LLM inference engine built to run large models efficiently across a globally distributed, low-latency network. It replaces Python-based vLLM in scenarios where sandboxing and dynamic co-hosting caused high CPU overhead and reduced GPU utilization, using JIT-compiled CUDA kernels, paged KV caching, and fine-grained CUDA graphs to cut startup and runtime cost. Early benchmarks show up to 7% lower latency on H100 NVL hardware, substantially higher GPU utilization, and far lower CPU load while powering models such as Llama 3.1 8B in Workers AI.

Tue, August 26, 2025

Microsoft Azure and NVIDIA Accelerate Scientific AI

#Biotech #Digital Twins #Drug Discovery #GPU Acceleration #Microsoft #NVIDIA #PHI

🔬 This blog highlights how Microsoft Azure and NVIDIA combine cloud infrastructure and GPU-accelerated AI tooling to speed scientific discovery and commercial deployment. It profiles three startups—Pangaea Data, Basecamp Research, and Global Objects—demonstrating applications from clinical decision support to large-scale protein databases and photorealistic digital twins. The piece emphasizes measurable outcomes, compliance, and the importance of scalable compute and optimized AI frameworks for real-world impact.

Mon, August 25, 2025

Amazon EC2 G6 Instances with NVIDIA L4 Now in UAE Region

#AMD #AWS #AWS EC2 #NVIDIA #Product Release

🚀 Amazon has launched EC2 G6 instances powered by NVIDIA L4 GPUs in the Middle East (UAE) Region, expanding cloud GPU capacity for graphics and ML workloads. G6 instances offer up to 8 L4 GPUs with 24 GB per GPU, third-generation AMD EPYC processors, up to 192 vCPUs, 100 Gbps networking, and up to 7.52 TB local NVMe storage. They are available via On-Demand, Reserved, Spot, and Savings Plans and can be managed through the AWS Console, CLI, and SDKs.

Mon, August 25, 2025

vLLM Performance Tuning for xPU Inference Configs Guide

#Google #Hugging Face #NVIDIA #VLLM

⚙️ This guide from Google Cloud authors Eric Hanley and Brittany Rockwell explains how to tune vLLM deployments for xPU inference, covering accelerator selection, memory sizing, configuration, and benchmarking. It shows how to gather workload parameters, estimate HBM/VRAM needs (example: gemma-3-27b-it ≈57 GB), and run vLLM’s auto_tune to find optimal gpu_memory_utilization and throughput. The post compares GPU and TPU options and includes practical troubleshooting tips, cost analyses, and resources to reproduce benchmarks and HBM calculations.