All news with #nvidia tag

104 articles · page 4 of 6

February 4, 2026

Amazon EC2 G7e Instances Launch in US West (Oregon)

🚀 Amazon EC2 G7e instances, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, are now available in US West (Oregon). These instances deliver up to 2.3x inference performance compared to G6e and offer up to eight GPUs with 96 GB per GPU, 5th Gen Intel Xeon processors, 192 vCPUs, and up to 1600 Gbps of networking. They target LLMs, agentic and multimodal generative AI, spatial computing, and workloads that require combined graphics and AI processing, while supporting GPUDirect P2P and RDMA in UltraClusters for low-latency multi-GPU scaling.

AWS EC2 Nvidia Product Launch

January 26, 2026

Maia 200: Microsoft’s Inference Accelerator for AI Workloads

🤖 Maia 200 is Microsoft’s new first‑party AI inference accelerator, fabricated on TSMC’s 3nm process and optimized for low‑precision tensor compute. It combines native FP8/FP4 tensor cores with 216GB of HBM3e at 7 TB/s, 272MB on‑chip SRAM and specialized data‑movement engines to raise token throughput and utilization. Microsoft positions Maia 200 as delivering over 10 petaFLOPS (FP4), about 30% better performance‑per‑dollar versus its prior fleet, and higher FP8 performance than competing hyperscaler accelerators. Deployed in US Central with Azure integration and an SDK preview, Maia 200 targets inference for services such as Microsoft 365 Copilot and OpenAI GPT‑5.2.

Microsoft Nvidia Product Launch

January 22, 2026

Scaling MoE Inference with NVIDIA Dynamo on A4X Rack-Scale

🚀 This post describes two validated deployment recipes for serving large Mixture-of-Experts (MoE) models on Google Cloud's A4X machines using NVIDIA Dynamo. The recipes provide throughput- and latency-optimized configurations that exploit the 72‑GPU GB200 NVL72 rack fabric, WideEP/DeepEP parallelism, global KV cache, and GKE-aware rack-level scheduling. Performance validation reports >6K tokens/sec/GPU for the throughput recipe and a 10ms median inter-token latency for the latency-optimized recipe.

Google Cloud Nvidia Product Launch

January 20, 2026

Amazon EC2 G7e Instances Now GA with NVIDIA Blackwell

🚀 Amazon EC2 G7e instances are now generally available, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. G7e delivers up to 2.3x inference performance versus G6e and supports configurations with up to 8 GPUs (96 GB each), 5th Gen Intel Xeon processors, 192 vCPUs, and up to 1600 Gbps of Elastic Fabric Adapter networking. Designed for LLMs, multimodal and spatial computing workloads, G7e includes NVIDIA GPUDirect P2P and RDMA support in EC2 UltraClusters and is available in US East (N. Virginia) and US East (Ohio) as On‑Demand, Spot, or via Savings Plans.

AWS AWS EC2 Nvidia Product Launch

January 19, 2026

Python libraries for Hugging Face models enable RCE

⚠️ Researchers at Palo Alto Networks' Unit 42 disclosed critical weaknesses in the NeMo, Uni2TS and FlexTok Python libraries used with Hugging Face models, where malicious code can be hidden in model metadata and executed automatically when a manipulated file is loaded. The root cause is the use of Hydra's instantiate(), which accepts arbitrary callables and arguments and can therefore permit remote code execution if metadata is untrusted. Vendors including NVIDIA, Salesforce and the maintainers of FlexTok have issued fixes and CVE assignments; users should upgrade affected libraries and audit models before loading.

Hugging Face Nvidia Salesforce Supply Chain Compromise

January 13, 2026

RCE Risks in AI Python Libraries via Config Instantiation

🔒 Three widely used open-source AI/ML Python libraries — NVIDIA NeMo, Salesforce uni2TS, and Apple ml-flextok — were found vulnerable to remote code execution when model metadata was treated as executable configuration. The root cause is unsafe use of configuration-driven instantiation (for example Hydra's instantiate()) that accepts attacker-controlled _target_ values. Vendors released patches and CVE notices; users should apply fixes, restrict allowed targets, and avoid loading models from untrusted sources.

Remote Code Execution Nvidia Salesforce AI Security

January 5, 2026

Palo Alto Networks Prisma AIRS Validated for NVIDIA AI

🔒 Palo Alto Networks announced that Prisma AIRS, accelerated on the NVIDIA BlueField DPU, is now part of the NVIDIA Enterprise AI Factory validated design. The integration embeds zero trust runtime security into AI infrastructure by running Prisma AIRS Network Intercept on BlueField and extending enforcement to cloud environments. It leverages NVIDIA DOCA and DOCA Argus telemetry to feed Cortex XSIAM and Cortex XSOAR for AI-driven detection and response, and recommends hyperscale firewall clusters for defense-in-depth and improved TCO.

Palo Alto Networks Nvidia AI Security Zero Trust

January 5, 2026

Azure Strategic Planning Enables NVIDIA Rubin Deployments

🚀 Azure says its long-range datacenter strategy already accommodates NVIDIA Vera Rubin NVL72 racks, enabling rapid, large-scale rollouts across current Fairwater sites and planned AI superfactories. Microsoft highlights prior experience with Ampere, Hopper, GB200 and GB300 generations and claims its power, cooling, networking, and memory upgrades align with Rubin’s NVLink, ConnectX‑9, and HBM4 requirements. The post frames co-design work as reducing deployment risk and accelerating customer access to higher-performance inference and training at scale.

Microsoft Azure Nvidia News

January 5, 2026

Check Point and NVIDIA Partner to Secure AI Factories

🔒 Check Point and NVIDIA announced an integrated security capability to protect AI "factories" across the entire AI lifecycle, from data ingestion and model training to deployment and inference. The effort targets growing risks such as prompt manipulation and attacks on GenAI infrastructure, which Gartner and other industry surveys identify as rising threats. The collaboration focuses on unified visibility, real-time detection, runtime protection, and centralized policy enforcement to reduce operational risk and help organizations meet compliance and governance requirements.

Check Point Nvidia AI Security AI Governance

January 5, 2026

Amazon EC2 G5 Instances with NVIDIA A10G Now in Hong Kong

🚀 Amazon Web Services has launched Amazon EC2 G5 instances powered by NVIDIA A10G Tensor Core GPUs in the Asia Pacific (Hong Kong) region to support graphics-intensive and machine learning workloads. These instances scale to eight A10G GPUs with 2nd-generation AMD EPYC processors, up to 192 vCPUs, 100 Gbps networking and 7.6 TB of NVMe local storage across eight size options. Customers can tune performance with NVIDIA drivers for compute, gaming, or workstation workloads and purchase capacity as On-Demand or Reserved Instances to meet cost and operational needs.

AWS AWS EC2 Nvidia News

January 5, 2026

Customizing NVIDIA Nemotron for Security Query Translation

🔒 CrowdStrike and NVIDIA operationalized Nemotron LLMs to enable natural-language-to-CQL translation inside the Falcon platform. They leveraged millions of analyst queries, AST-based deduplication, and a PII scrubbing pipeline, then used NVIDIA NeMo Data Designer to generate synthetic natural-language descriptions for fine-tuning. Fine-tuning Llama-3.3-Nemotron-Super-49B-v1.5 with LoRA produced improved accuracy, interpretability through intermediate reasoning, and 96% valid-query accuracy versus frontier alternatives.

CrowdStrike Nvidia LLM Security CrowdStrike Falcon

December 23, 2025

NVIDIA Nemotron 3 Nano Now Available on Amazon Bedrock

🚀 Amazon Bedrock now supports NVIDIA Nemotron 3 Nano 30B A3B, NVIDIA's efficient hybrid Mixture-of-Experts language model with a 256k token context window and native tool-calling support. The model delivers higher throughput for agentic, coding, and complex reasoning workloads while preserving the depth of larger models through advanced reinforcement learning and multi-environment post-training. Powered by Project Mantle, Bedrock provides serverless distributed inference, QoS controls, automated capacity management and OpenAI API compatibility across multiple AWS Regions.

Nvidia Amazon Bedrock AWS

December 18, 2025

Amazon EC2 C8a Compute Instances Launch in Spain Region

🚀 Amazon Web Services has launched the compute-optimized EC2 C8a instances in the Europe (Spain) region. Powered by 5th Gen AMD EPYC processors running up to 4.5 GHz, C8a offers up to 30% higher performance and up to 19% better price-performance versus C7a, plus 33% greater memory bandwidth. Available in 12 sizes (including two bare-metal options), they target high-performance, latency-sensitive workloads and support Savings Plans, On-Demand, and Spot purchasing.

AWS AWS EC2 Product Launch Nvidia

December 4, 2025

NVIDIA Run:ai Model Streamer Adds Cloud Storage Support

🚀 The NVIDIA Run:ai Model Streamer now supports native Google Cloud Storage access, accelerating model load and inference startup for vLLM workloads on GKE. By streaming tensors directly from Cloud Storage into GPU memory and using distributed, NVLink-aware transfers, the streamer dramatically reduces cold-start latency and idle GPU time. Enabling it in vLLM is a single-flag change and it can leverage GKE Workload Identity for secure, keyless access.

Nvidia Google Cloud Product Update

December 3, 2025

Azure expands local and hybrid options for AI and control

🔒 Microsoft is expanding Azure with on‑premises, edge, and hybrid options to deliver AI, resilience, and operational sovereignty. Azure Local provides integrated compute, storage, and networking on customer premises with GA features like Microsoft 365 Local and NVIDIA Blackwell GPUs, plus previews for disconnected operations and multi‑rack scale. Coupled with Azure IoT, Microsoft Fabric, and Azure Arc management enhancements, the updates enable near‑real‑time analytics, secure device identity, and a unified control plane for distributed estates. The goal is to accelerate AI and analytics while preserving data residency, continuity, and compliance for regulated or mission‑critical environments.

Microsoft Azure Nvidia AI Governance

December 2, 2025

Amazon EC2 P6e-GB300 UltraServers Now Generally Available

🚀 AWS has announced general availability of Amazon EC2 P6e-GB300 UltraServers powered by the NVIDIA GB300 NVL72. The new UltraServers deliver 1.5× GPU memory and 1.5× FP4 compute (without sparsity) compared with P6e-GB200, enabling higher-context inference and improved throughput for large models. Ideal for reasoning, Agentic AI, and production inference; contact your AWS sales representative to get started.

AWS Nvidia

November 24, 2025

SageMaker HyperPod Adds NVIDIA MIG GPU Partitioning

🚀 Amazon SageMaker HyperPod now supports NVIDIA Multi-Instance GPU (MIG), enabling administrators to partition a single GPU into multiple isolated devices to run simultaneous small generative AI tasks. Administrators can use an easy console configuration or a custom setup for fine-grained hardware isolation, allocate compute quotas across teams, and monitor real-time performance per partition via a utilization dashboard. Available on HyperPod clusters using the EKS orchestrator in multiple AWS Regions, this capability reduces wait times by letting data scientists run lightweight inference and interactive notebooks in parallel without consuming full GPU capacity.

AWS Nvidia Cloud Security

November 21, 2025

Nvidia issues hotfix driver for Windows October update

🔧 Nvidia released the GeForce Hotfix Display Driver 581.94 to address gaming performance regressions reported after the October 2025 Windows update (KB5066835 [5561605]) affecting Windows 11 24H2 and 25H2 systems. The company notes this is a beta hotfix with an abbreviated QA cycle and is provided as-is to deliver targeted fixes more quickly. The driver is available from Nvidia Customer Care for Windows 10 x64 and Windows 11 x64 PCs.

Nvidia Patch Release

November 18, 2025

AWS launches EC2 P6-B300 with NVIDIA Blackwell Ultra

🚀 Amazon Web Services has announced general availability of Amazon EC2 P6-B300 instances powered by NVIDIA Blackwell Ultra B300 GPUs. The p6-b300.48xlarge delivers eight GPUs, 2.1 TB of high-bandwidth GPU memory, 6.4 Tbps EFA networking, 300 Gbps ENA throughput, and 4 TB of system memory. It targets training and deploying trillion-parameter foundation models and LLMs, offering higher memory, compute, and networking versus P6-B200.

AWS AWS EC2 Nvidia

November 14, 2025

ShadowMQ Deserialization Flaws in Major AI Inference Engines

⚠️ Oligo Security researcher Avi Lumelsky disclosed a widespread insecure-deserialization pattern dubbed ShadowMQ that affects major AI inference engines including vLLM, NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server and SGLang. The root cause is using ZeroMQ's recv_pyobj() to deserialize network input with Python's pickle, permitting remote arbitrary code execution. Patches vary: some projects fixed the issue, others remain partially addressed or unpatched, and mitigations include applying updates, removing exposed ZMQ sockets, and auditing code for unsafe deserialization.

Insecure Deserialization Remote Code Execution Nvidia Microsoft