All news with #nvidia tag
Tue, November 18, 2025
AWS launches EC2 P6-B300 with NVIDIA Blackwell Ultra
🚀 Amazon Web Services has announced general availability of Amazon EC2 P6-B300 instances powered by NVIDIA Blackwell Ultra B300 GPUs. The p6-b300.48xlarge delivers eight GPUs, 2.1 TB of high-bandwidth GPU memory, 6.4 Tbps EFA networking, 300 Gbps ENA throughput, and 4 TB of system memory. It targets training and deploying trillion-parameter foundation models and LLMs, offering higher memory, compute, and networking versus P6-B200.
Mon, November 17, 2025
Microsoft and NVIDIA Enable Real-Time AI Defenses at Scale
🔒 Microsoft and NVIDIA describe a joint effort to convert adversarial learning research into production-grade, real-time cyber defenses. They transitioned transformer-based classifiers from CPU to GPU inference—using Triton and a TensorRT-compiled engine—to dramatically reduce latency and increase throughput for live traffic inspection. Key engineering advances include fused CUDA kernels and a domain-specific tokenizer, enabling low-latency, high-accuracy detection of adversarial payloads in inline production settings.
Fri, November 14, 2025
ShadowMQ Deserialization Flaws in Major AI Inference Engines
⚠️ Oligo Security researcher Avi Lumelsky disclosed a widespread insecure-deserialization pattern dubbed ShadowMQ that affects major AI inference engines including vLLM, NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server and SGLang. The root cause is using ZeroMQ's recv_pyobj() to deserialize network input with Python's pickle, permitting remote arbitrary code execution. Patches vary: some projects fixed the issue, others remain partially addressed or unpatched, and mitigations include applying updates, removing exposed ZMQ sockets, and auditing code for unsafe deserialization.
Thu, November 13, 2025
AWS Expands EC2 G6f NVIDIA L4 GPU Instances to More Regions
🚀 Amazon Web Services has expanded availability of EC2 G6f instances powered by NVIDIA L4 GPUs to Europe (Spain) and Asia Pacific (Seoul), improving access for graphics and visualization workloads. G6f instances support GPU partitions as small as one-eighth of a GPU with 3 GB of GPU memory, enabling finer-grained right-sizing and cost savings compared to single‑GPU options. Instances are offered in multiple sizes paired with third‑generation AMD EPYC processors, and are purchasable as On‑Demand, Spot, or via Savings Plans; customers should use NVIDIA GRID driver 18.4 or later to launch these instances.
Wed, November 12, 2025
Microsoft unveils Fairwater AI datacenter in Atlanta
🚀 Microsoft announced the new Fairwater Azure AI datacenter in Atlanta, Georgia, expanding its planet-scale AI superfactory. The purpose-built facility integrates massive NVIDIA Blackwell GPU clusters on a single flat network and uses rack-level direct liquid cooling plus a two-story layout to maximize compute density and reduce latency. It also connects via a dedicated AI WAN to enable cross-site fungibility and dynamic workload allocation.
Mon, November 10, 2025
Full-Stack Approach to Scaling RL for LLMs on GKE at Scale
🚀 Google Cloud describes a full-stack solution for running high-scale Reinforcement Learning (RL) with LLMs, combining custom TPU hardware, NVIDIA GPUs, and optimized software libraries. The approach addresses RL's hybrid demands—reducing sampler latency, easing memory contention across actor/critic/reward models, and accelerating weight copying—by co-designing hardware, storage (Managed Lustre, Cloud Storage), and orchestration on GKE. The blog emphasizes open-source contributions (vLLM, llm-d, MaxText, Tunix) and integrations with Ray and NeMo RL recipes to improve portability and developer productivity. It also highlights mega-scale orchestration and multi-cluster strategies to run production RL jobs at tens of thousands of nodes.
Mon, November 10, 2025
Amazon Braket Adds Native CUDA-Q Support in Notebooks
🔬 Amazon Braket notebook instances now include native support for CUDA-Q, enabled by upgrading the underlying OS to Amazon Linux 2023 to deliver improved performance, security, and compatibility for quantum development and production-ready workflows. Developers can run GPU-accelerated quantum circuit simulation alongside access to QPUs from IonQ, Rigetti, and IQM within the managed notebook environment. This eliminates the need for local deployment or separate Hybrid Jobs, streamlining hybrid quantum-classical experimentation. CUDA-Q support is available in all Regions where Braket operates.
Mon, November 10, 2025
New hardware attack (TEE.fail) breaks modern secure enclaves
🔒 A new low-cost hardware-assisted attack called TEE.fail undermines current trusted execution environments from major chipmakers. The method inserts a tiny device between a memory module and the motherboard and requires a compromised OS kernel to extract secrets, defeating protections in Confidential Compute, SEV-SNP, and TDX/SDX. The attack completes in roughly three minutes and works against DDR5 memory, meaning the physical-access threats TEEs are designed to defend against are no longer reliably mitigated.
Fri, November 7, 2025
Tiered KV Cache Boosts LLM Performance on GKE with HBM
🚀 LMCache implements a node-local, tiered KV Cache on GKE to extend the GPU HBM-backed Key-Value store into CPU RAM and local SSD, increasing effective cache capacity and hit ratio. In benchmarks using Llama-3.3-70B-Instruct on an A3 mega instance (8×nvidia-h100-mega-80gb), configurations that added RAM and SSD reduced Time-to-First-Token and materially increased token throughput for long system prompts. The results demonstrate a practical approach to scale context windows while balancing cost and latency on GKE.
Thu, November 6, 2025
Leading Bug Bounty Programs and Market Shifts 2025
🔒 Bug bounty programs remain a core component of security testing in 2025, drawing external researchers to identify flaws across web, mobile, AI, and critical infrastructure. Leading platforms like Bugcrowd, HackerOne, Synack and vendors such as Apple, Google, Microsoft and OpenAI have broadened scopes and increased payouts. Firms now reward full exploit chains and emphasize human-led reconnaissance over purely automated scanning. Programs also support regulatory compliance in critical sectors.
Wed, November 5, 2025
Microsoft Expands Sovereign Cloud Capabilities, EU Focus
🛡️ Microsoft announced expanded sovereign cloud offerings aimed at helping governments and enterprises meet regulatory and resilience requirements across Europe and beyond. The update includes end-to-end AI data processing within an EU Data Boundary, expanded Microsoft 365 Copilot in-country processing to 15 countries and additional rollouts through 2026, plus a refreshed Sovereign Landing Zone for simplified deployment of sovereign controls. Azure Local gains increased scale, external SAN support, and NVIDIA RTX Pro 6000 Blackwell GPUs for high-performance on-prem AI, along with planned disconnected operations. A new Digital Sovereignty specialization gives partners a way to validate and badge their sovereign-cloud expertise.
Mon, November 3, 2025
How Scientists Can Use Gemini Enterprise for AI Workflows
🔬 Google Cloud presents how researchers can accelerate scientific workflows by combining Gemini Enterprise with integrated HPC infrastructure. It showcases AI agents—like the Deep Research agent for literature synthesis and the Idea Generation agent for proposing and ranking hypotheses—alongside developer tooling such as Gemini Code Assist and Gemini CLI for code, debugging, and workflow automation. The platform pairs these capabilities with purpose-built VMs (H4D, A4, A4X) and Google Cloud Managed Lustre to scale simulations and analysis.
Wed, October 29, 2025
Google Public Sector Summit: A New Era for Government AI
🔔 At the Google Public Sector Summit in Washington D.C., leaders highlighted a shift toward agentic AI and large-scale cloud modernization. Google introduced Gemini for Government, an accredited platform providing an AI Agent Gallery, agent-to-agent protocols, enterprise connectors, and governance controls to deploy and monitor AI agents. Speakers showcased real-world deployments across defense, city, and education sectors, and Google announced expanded partner investments plus an enhanced partnership with NVIDIA to support on-premises and air-gapped environments.
Tue, October 28, 2025
Check Point's AI Cloud Protect with NVIDIA BlueField
🔒 Check Point has made AI Cloud Protect powered by NVIDIA BlueField available for enterprise deployment, offering DPU-accelerated security for cloud AI workloads. The solution aims to inspect and protect GenAI traffic and prompts to reduce data exposure risks while integrating with existing cloud environments. It targets prompt manipulation and infrastructure attacks at scale and is positioned for organizations building AI factories.
Tue, October 28, 2025
A4X Max, GKE Networking, and Vertex AI Training Now Shipping
🚀 Google Cloud is expanding its NVIDIA collaboration with the new A4X Max instances powered by NVIDIA GB300 NVL72, delivering 72 GPUs with high‑bandwidth NVLink and shared memory for demanding multimodal reasoning. GKE now supports DRANET for topology‑aware RDMA scheduling and integrates NVIDIA NeMo Guardrails into GKE Inference Gateway, while Vertex AI Model Garden will host NVIDIA Nemotron models. Vertex AI Training adds NeMo and NeMo‑RL recipes and a managed Slurm environment to accelerate large‑scale training and deployment.
Tue, October 28, 2025
Microsoft and NVIDIA Deepen AI Infrastructure Partnership
🚀 Microsoft and NVIDIA announced expanded AI infrastructure on Azure, bringing NVIDIA RTX PRO 6000 Blackwell Server Edition to Azure Local, new Nemotron and Cosmos models via Azure AI Foundry, and broader support for Run:ai and GB300 NVL72 supercomputing clusters. These updates enable on-premises and edge AI with cloud-like management, improved GPU utilization, and infrastructure tailored for frontier reasoning, multimodal workloads, and real-time inferencing. Microsoft also highlighted NVIDIA Dynamo optimizations for ND GB200-v6 VMs to boost inference throughput at scale.
Tue, October 28, 2025
Securing the AI Factory: Palo Alto Networks and NVIDIA
🔒 Palo Alto Networks outlines a platform-centric approach to protect the enterprise AI Factory, announcing integration of Prisma AIRS with NVIDIA BlueField DPUs. The collaboration embeds distributed zero-trust security directly into infrastructure, delivering agentless, penalty-free runtime protection and real-time workload threat detection. Validated on NVIDIA RTX PRO Server and optimized for BlueField‑3, with BlueField‑4 forthcoming, the solution ties into Strata Cloud Manager and Cortex for end-to-end visibility and control, aiming to secure AI operations at scale without compromising performance.
Tue, October 28, 2025
TEE.Fail breaks confidential computing on DDR5 CPUs
🔓 Academic researchers disclosed TEE.Fail, a DDR5 memory-bus interposition side-channel that can extract secrets from Trusted Execution Environments such as Intel SGX, Intel TDX, and AMD SEV-SNP. By inserting an inexpensive interposer between a DDR5 DIMM and the motherboard and recording command/address and data bursts, attackers can map deterministic AES-XTS ciphertexts to plaintext values and recover signing and cryptographic keys. The method requires physical access and kernel privileges but can be implemented for under $1,000; Intel, AMD and NVIDIA were notified and are developing mitigations.
Mon, October 20, 2025
G4 VMs: High-performance P2P Fabric for Multi‑GPU Workloads
🚀 Google Cloud's newly GA G4 VMs combine NVIDIA RTX PRO 6000 Blackwell GPUs with a custom, software-defined PCIe fabric to enable high-performance peer-to-peer (P2P) GPU communication. The platform accelerates collective operations like All-Gather and All-Reduce without code changes, delivering up to 2.2x faster collectives. For tensor-parallel inference, customers can see up to 168% higher throughput and up to 41% lower inter-token latency. G4 integrates with GKE Inference Gateway for horizontal scaling and production deployments.
Mon, October 20, 2025
Google Cloud G4 VMs: NVIDIA RTX PRO 6000 Blackwell GA
🚀 The G4 VM is now generally available on Google Cloud, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs and offering up to 768 GB of GDDR7 memory per instance class. It targets latency-sensitive and regulated workloads for generative AI, real-time rendering, simulation, and virtual workstations. Features include FP4 precision support, Multi-Instance GPU (MIG) partitioning, an enhanced PCIe P2P interconnect for faster multi‑GPU All-Reduce, and an NVIDIA Omniverse VMI on Marketplace for industrial digital twins.