< ciso
brief />
Tag Banner

All news with #nvidia tag

86 articles · page 3 of 5

Amazon EC2 G6e Instances with NVIDIA L40S Now in UAE

🚀 Amazon Web Services has launched Amazon EC2 G6e instances powered by NVIDIA L40S Tensor Core GPUs in the Middle East (UAE) Region. These instances offer up to eight L40S GPUs with 48 GB each, third-generation AMD EPYC processors, up to 192 vCPUs, 1.536 TB system memory, 7.6 TB local NVMe storage, and up to 400 Gbps networking. G6e is optimized for ML workloads including LLMs, diffusion models for image/video/audio generation, and large-scale spatial computing and digital twins. Instances are available across multiple regions and purchasable via On-Demand, Reserved, Spot, and Savings Plans.
read more →

Amazon EC2 G6e Instances with NVIDIA L40S in UAE Region

🚀 Amazon Web Services has made EC2 G6e instances powered by NVIDIA L40S Tensor Core GPUs available in the Middle East (UAE) Region. These instances support up to eight L40S GPUs (48 GB each), third-generation AMD EPYC processors, up to 192 vCPUs, 1.536 TB memory, 400 Gbps networking and 7.6 TB NVMe storage. They target machine learning and spatial computing workloads — from deploying large language and diffusion models to building immersive 3D simulations and digital twins. G6e instances are available via On‑Demand, Reserved, Spot and Savings Plans.
read more →

Amazon EC2 G7e Instances Launch in US West (Oregon)

🚀 Amazon EC2 G7e instances, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, are now available in US West (Oregon). These instances deliver up to 2.3x inference performance compared to G6e and offer up to eight GPUs with 96 GB per GPU, 5th Gen Intel Xeon processors, 192 vCPUs, and up to 1600 Gbps of networking. They target LLMs, agentic and multimodal generative AI, spatial computing, and workloads that require combined graphics and AI processing, while supporting GPUDirect P2P and RDMA in UltraClusters for low-latency multi-GPU scaling.
read more →

Maia 200: Microsoft’s Inference Accelerator for AI Workloads

🤖 Maia 200 is Microsoft’s new first‑party AI inference accelerator, fabricated on TSMC’s 3nm process and optimized for low‑precision tensor compute. It combines native FP8/FP4 tensor cores with 216GB of HBM3e at 7 TB/s, 272MB on‑chip SRAM and specialized data‑movement engines to raise token throughput and utilization. Microsoft positions Maia 200 as delivering over 10 petaFLOPS (FP4), about 30% better performance‑per‑dollar versus its prior fleet, and higher FP8 performance than competing hyperscaler accelerators. Deployed in US Central with Azure integration and an SDK preview, Maia 200 targets inference for services such as Microsoft 365 Copilot and OpenAI GPT‑5.2.
read more →

Scaling MoE Inference with NVIDIA Dynamo on A4X Rack-Scale

🚀 This post describes two validated deployment recipes for serving large Mixture-of-Experts (MoE) models on Google Cloud's A4X machines using NVIDIA Dynamo. The recipes provide throughput- and latency-optimized configurations that exploit the 72‑GPU GB200 NVL72 rack fabric, WideEP/DeepEP parallelism, global KV cache, and GKE-aware rack-level scheduling. Performance validation reports >6K tokens/sec/GPU for the throughput recipe and a 10ms median inter-token latency for the latency-optimized recipe.
read more →

Amazon EC2 G7e Instances Now GA with NVIDIA Blackwell

🚀 Amazon EC2 G7e instances are now generally available, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. G7e delivers up to 2.3x inference performance versus G6e and supports configurations with up to 8 GPUs (96 GB each), 5th Gen Intel Xeon processors, 192 vCPUs, and up to 1600 Gbps of Elastic Fabric Adapter networking. Designed for LLMs, multimodal and spatial computing workloads, G7e includes NVIDIA GPUDirect P2P and RDMA support in EC2 UltraClusters and is available in US East (N. Virginia) and US East (Ohio) as On‑Demand, Spot, or via Savings Plans.
read more →

Python libraries for Hugging Face models enable RCE

⚠️ Researchers at Palo Alto Networks' Unit 42 disclosed critical weaknesses in the NeMo, Uni2TS and FlexTok Python libraries used with Hugging Face models, where malicious code can be hidden in model metadata and executed automatically when a manipulated file is loaded. The root cause is the use of Hydra's instantiate(), which accepts arbitrary callables and arguments and can therefore permit remote code execution if metadata is untrusted. Vendors including NVIDIA, Salesforce and the maintainers of FlexTok have issued fixes and CVE assignments; users should upgrade affected libraries and audit models before loading.
read more →

RCE Risks in AI Python Libraries via Config Instantiation

🔒 Three widely used open-source AI/ML Python libraries — NVIDIA NeMo, Salesforce uni2TS, and Apple ml-flextok — were found vulnerable to remote code execution when model metadata was treated as executable configuration. The root cause is unsafe use of configuration-driven instantiation (for example Hydra's instantiate()) that accepts attacker-controlled _target_ values. Vendors released patches and CVE notices; users should apply fixes, restrict allowed targets, and avoid loading models from untrusted sources.
read more →

Palo Alto Networks Prisma AIRS Validated for NVIDIA AI

🔒 Palo Alto Networks announced that Prisma AIRS, accelerated on the NVIDIA BlueField DPU, is now part of the NVIDIA Enterprise AI Factory validated design. The integration embeds zero trust runtime security into AI infrastructure by running Prisma AIRS Network Intercept on BlueField and extending enforcement to cloud environments. It leverages NVIDIA DOCA and DOCA Argus telemetry to feed Cortex XSIAM and Cortex XSOAR for AI-driven detection and response, and recommends hyperscale firewall clusters for defense-in-depth and improved TCO.
read more →

Azure Strategic Planning Enables NVIDIA Rubin Deployments

🚀 Azure says its long-range datacenter strategy already accommodates NVIDIA Vera Rubin NVL72 racks, enabling rapid, large-scale rollouts across current Fairwater sites and planned AI superfactories. Microsoft highlights prior experience with Ampere, Hopper, GB200 and GB300 generations and claims its power, cooling, networking, and memory upgrades align with Rubin’s NVLink, ConnectX‑9, and HBM4 requirements. The post frames co-design work as reducing deployment risk and accelerating customer access to higher-performance inference and training at scale.
read more →

Check Point and NVIDIA Partner to Secure AI Factories

🔒 Check Point and NVIDIA announced an integrated security capability to protect AI "factories" across the entire AI lifecycle, from data ingestion and model training to deployment and inference. The effort targets growing risks such as prompt manipulation and attacks on GenAI infrastructure, which Gartner and other industry surveys identify as rising threats. The collaboration focuses on unified visibility, real-time detection, runtime protection, and centralized policy enforcement to reduce operational risk and help organizations meet compliance and governance requirements.
read more →

Amazon EC2 G5 Instances with NVIDIA A10G Now in Hong Kong

🚀 Amazon Web Services has launched Amazon EC2 G5 instances powered by NVIDIA A10G Tensor Core GPUs in the Asia Pacific (Hong Kong) region to support graphics-intensive and machine learning workloads. These instances scale to eight A10G GPUs with 2nd-generation AMD EPYC processors, up to 192 vCPUs, 100 Gbps networking and 7.6 TB of NVMe local storage across eight size options. Customers can tune performance with NVIDIA drivers for compute, gaming, or workstation workloads and purchase capacity as On-Demand or Reserved Instances to meet cost and operational needs.
read more →

Customizing NVIDIA Nemotron for Security Query Translation

🔒 CrowdStrike and NVIDIA operationalized Nemotron LLMs to enable natural-language-to-CQL translation inside the Falcon platform. They leveraged millions of analyst queries, AST-based deduplication, and a PII scrubbing pipeline, then used NVIDIA NeMo Data Designer to generate synthetic natural-language descriptions for fine-tuning. Fine-tuning Llama-3.3-Nemotron-Super-49B-v1.5 with LoRA produced improved accuracy, interpretability through intermediate reasoning, and 96% valid-query accuracy versus frontier alternatives.
read more →

NVIDIA Nemotron 3 Nano Now Available on Amazon Bedrock

🚀 Amazon Bedrock now supports NVIDIA Nemotron 3 Nano 30B A3B, NVIDIA's efficient hybrid Mixture-of-Experts language model with a 256k token context window and native tool-calling support. The model delivers higher throughput for agentic, coding, and complex reasoning workloads while preserving the depth of larger models through advanced reinforcement learning and multi-environment post-training. Powered by Project Mantle, Bedrock provides serverless distributed inference, QoS controls, automated capacity management and OpenAI API compatibility across multiple AWS Regions.
read more →

Amazon EC2 C8a Compute Instances Launch in Spain Region

🚀 Amazon Web Services has launched the compute-optimized EC2 C8a instances in the Europe (Spain) region. Powered by 5th Gen AMD EPYC processors running up to 4.5 GHz, C8a offers up to 30% higher performance and up to 19% better price-performance versus C7a, plus 33% greater memory bandwidth. Available in 12 sizes (including two bare-metal options), they target high-performance, latency-sensitive workloads and support Savings Plans, On-Demand, and Spot purchasing.
read more →

NVIDIA Run:ai Model Streamer Adds Cloud Storage Support

🚀 The NVIDIA Run:ai Model Streamer now supports native Google Cloud Storage access, accelerating model load and inference startup for vLLM workloads on GKE. By streaming tensors directly from Cloud Storage into GPU memory and using distributed, NVLink-aware transfers, the streamer dramatically reduces cold-start latency and idle GPU time. Enabling it in vLLM is a single-flag change and it can leverage GKE Workload Identity for secure, keyless access.
read more →

Azure expands local and hybrid options for AI and control

🔒 Microsoft is expanding Azure with on‑premises, edge, and hybrid options to deliver AI, resilience, and operational sovereignty. Azure Local provides integrated compute, storage, and networking on customer premises with GA features like Microsoft 365 Local and NVIDIA Blackwell GPUs, plus previews for disconnected operations and multi‑rack scale. Coupled with Azure IoT, Microsoft Fabric, and Azure Arc management enhancements, the updates enable near‑real‑time analytics, secure device identity, and a unified control plane for distributed estates. The goal is to accelerate AI and analytics while preserving data residency, continuity, and compliance for regulated or mission‑critical environments.
read more →

Amazon EC2 P6e-GB300 UltraServers Now Generally Available

🚀 AWS has announced general availability of Amazon EC2 P6e-GB300 UltraServers powered by the NVIDIA GB300 NVL72. The new UltraServers deliver 1.5× GPU memory and 1.5× FP4 compute (without sparsity) compared with P6e-GB200, enabling higher-context inference and improved throughput for large models. Ideal for reasoning, Agentic AI, and production inference; contact your AWS sales representative to get started.
read more →

SageMaker HyperPod Adds NVIDIA MIG GPU Partitioning

🚀 Amazon SageMaker HyperPod now supports NVIDIA Multi-Instance GPU (MIG), enabling administrators to partition a single GPU into multiple isolated devices to run simultaneous small generative AI tasks. Administrators can use an easy console configuration or a custom setup for fine-grained hardware isolation, allocate compute quotas across teams, and monitor real-time performance per partition via a utilization dashboard. Available on HyperPod clusters using the EKS orchestrator in multiple AWS Regions, this capability reduces wait times by letting data scientists run lightweight inference and interactive notebooks in parallel without consuming full GPU capacity.
read more →

Nvidia issues hotfix driver for Windows October update

🔧 Nvidia released the GeForce Hotfix Display Driver 581.94 to address gaming performance regressions reported after the October 2025 Windows update (KB5066835 [5561605]) affecting Windows 11 24H2 and 25H2 systems. The company notes this is a beta hotfix with an abbreviated QA cycle and is provided as-is to deliver targeted fixes more quickly. The driver is available from Nvidia Customer Care for Windows 10 x64 and Windows 11 x64 PCs.
read more →