Tag Banner

All news with #nvidia tag

Thu, December 4, 2025

NVIDIA Run:ai Model Streamer Adds Cloud Storage Support

🚀 The NVIDIA Run:ai Model Streamer now supports native Google Cloud Storage access, accelerating model load and inference startup for vLLM workloads on GKE. By streaming tensors directly from Cloud Storage into GPU memory and using distributed, NVLink-aware transfers, the streamer dramatically reduces cold-start latency and idle GPU time. Enabling it in vLLM is a single-flag change and it can leverage GKE Workload Identity for secure, keyless access.

read more →

Wed, December 3, 2025

Azure expands local and hybrid options for AI and control

🔒 Microsoft is expanding Azure with on‑premises, edge, and hybrid options to deliver AI, resilience, and operational sovereignty. Azure Local provides integrated compute, storage, and networking on customer premises with GA features like Microsoft 365 Local and NVIDIA Blackwell GPUs, plus previews for disconnected operations and multi‑rack scale. Coupled with Azure IoT, Microsoft Fabric, and Azure Arc management enhancements, the updates enable near‑real‑time analytics, secure device identity, and a unified control plane for distributed estates. The goal is to accelerate AI and analytics while preserving data residency, continuity, and compliance for regulated or mission‑critical environments.

read more →

Tue, December 2, 2025

Practical Guide to GPU HBM for Fine-Tuning Models in Cloud

🔍 Running into CUDA out-of-memory errors is a common blocker when fine-tuning models; High Bandwidth Memory (HBM) holds model weights, optimizer state, gradients, activations, and framework overhead. The article breaks down those consumers, provides a simple HBM sizing formula, and walks through a 4B-parameter bfloat16 example that illustrates why full fine-tuning can require tens of GBs. It then presents practical mitigations—PEFT with LoRA, quantization and QLoRA, FlashAttention, and multi‑GPU approaches including data/model parallelism and FSDP—plus a sizing guide (16–40+ GB) to help choose the right hardware.

read more →

Tue, December 2, 2025

Amazon EC2 P6e-GB300 UltraServers Now Generally Available

🚀 AWS has announced general availability of Amazon EC2 P6e-GB300 UltraServers powered by the NVIDIA GB300 NVL72. The new UltraServers deliver 1.5× GPU memory and 1.5× FP4 compute (without sparsity) compared with P6e-GB200, enabling higher-context inference and improved throughput for large models. Ideal for reasoning, Agentic AI, and production inference; contact your AWS sales representative to get started.

read more →

Tue, December 2, 2025

AWS AI Factories: Dedicated High-Performance AI Infrastructure

🚀 AWS AI Factories are now available to deploy high-performance AWS AI infrastructure inside customer data centers, combining AWS Trainium, NVIDIA GPUs, low-latency networking, and optimized storage. The service integrates Amazon Bedrock and Amazon SageMaker to provide immediate access to foundation models without separate provider contracts. AWS manages procurement, setup, and operations while customers supply space and power, enabling isolated, sovereign deployments that accelerate AI initiatives.

read more →

Tue, December 2, 2025

Amazon Bedrock Adds 18 Fully Managed Open Models Today

🚀 Amazon Bedrock expanded its model catalog with 18 new fully managed open-weight models, the largest single addition to date. The offering includes Gemma 3, Mistral Large 3, NVIDIA Nemotron Nano 2, OpenAI gpt-oss variants and other vendor models. Through a unified API, developers can evaluate, switch, and adopt these models in production without rewriting applications or changing infrastructure. Models are available in supported AWS Regions.

read more →

Tue, December 2, 2025

CrowdStrike Leverages NVIDIA Nemotron on Amazon Bedrock

🔐 CrowdStrike integrates NVIDIA Nemotron via Amazon Bedrock to advance agentic security across the Falcon platform, enabling defenders to reason and act autonomously at scale. Falcon Fusion SOAR leverages Nemotron for adaptive, context-aware playbooks that prioritize alerts, understand relationships, and execute complex responses. Charlotte AI AgentWorks uses Bedrock-delivered models to create task-specific agents with real-time environmental awareness. The serverless Bedrock architecture reduces infrastructure overhead while preserving governance and analyst controls.

read more →

Mon, November 24, 2025

SageMaker HyperPod Adds NVIDIA MIG GPU Partitioning

🚀 Amazon SageMaker HyperPod now supports NVIDIA Multi-Instance GPU (MIG), enabling administrators to partition a single GPU into multiple isolated devices to run simultaneous small generative AI tasks. Administrators can use an easy console configuration or a custom setup for fine-grained hardware isolation, allocate compute quotas across teams, and monitor real-time performance per partition via a utilization dashboard. Available on HyperPod clusters using the EKS orchestrator in multiple AWS Regions, this capability reduces wait times by letting data scientists run lightweight inference and interactive notebooks in parallel without consuming full GPU capacity.

read more →

Fri, November 21, 2025

Nvidia issues hotfix driver for Windows October update

🔧 Nvidia released the GeForce Hotfix Display Driver 581.94 to address gaming performance regressions reported after the October 2025 Windows update (KB5066835 [5561605]) affecting Windows 11 24H2 and 25H2 systems. The company notes this is a beta hotfix with an abbreviated QA cycle and is provided as-is to deliver targeted fixes more quickly. The driver is available from Nvidia Customer Care for Windows 10 x64 and Windows 11 x64 PCs.

read more →

Fri, November 21, 2025

CloudWatch Container Insights: Sub-Minute GPU Metrics

🔍 Amazon CloudWatch Container Insights now supports configurable sub-minute GPU sampling for Amazon EKS, enabling GPU metrics to be collected at a per-second sample rate and aggregated to CloudWatch once per minute. This enhancement gives teams finer visibility into short-lived AI/ML inference and GPU-intensive workloads, helping to optimize resource utilization, troubleshoot performance issues, and improve operational efficiency for containerized GPU applications. The feature is available in all AWS Commercial Regions and AWS GovCloud (US) Regions at no additional cost.

read more →

Tue, November 18, 2025

AWS launches EC2 P6-B300 with NVIDIA Blackwell Ultra

🚀 Amazon Web Services has announced general availability of Amazon EC2 P6-B300 instances powered by NVIDIA Blackwell Ultra B300 GPUs. The p6-b300.48xlarge delivers eight GPUs, 2.1 TB of high-bandwidth GPU memory, 6.4 Tbps EFA networking, 300 Gbps ENA throughput, and 4 TB of system memory. It targets training and deploying trillion-parameter foundation models and LLMs, offering higher memory, compute, and networking versus P6-B200.

read more →

Mon, November 17, 2025

Microsoft and NVIDIA Enable Real-Time AI Defenses at Scale

🔒 Microsoft and NVIDIA describe a joint effort to convert adversarial learning research into production-grade, real-time cyber defenses. They transitioned transformer-based classifiers from CPU to GPU inference—using Triton and a TensorRT-compiled engine—to dramatically reduce latency and increase throughput for live traffic inspection. Key engineering advances include fused CUDA kernels and a domain-specific tokenizer, enabling low-latency, high-accuracy detection of adversarial payloads in inline production settings.

read more →

Fri, November 14, 2025

ShadowMQ Deserialization Flaws in Major AI Inference Engines

⚠️ Oligo Security researcher Avi Lumelsky disclosed a widespread insecure-deserialization pattern dubbed ShadowMQ that affects major AI inference engines including vLLM, NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server and SGLang. The root cause is using ZeroMQ's recv_pyobj() to deserialize network input with Python's pickle, permitting remote arbitrary code execution. Patches vary: some projects fixed the issue, others remain partially addressed or unpatched, and mitigations include applying updates, removing exposed ZMQ sockets, and auditing code for unsafe deserialization.

read more →

Thu, November 13, 2025

AWS Expands EC2 G6f NVIDIA L4 GPU Instances to More Regions

🚀 Amazon Web Services has expanded availability of EC2 G6f instances powered by NVIDIA L4 GPUs to Europe (Spain) and Asia Pacific (Seoul), improving access for graphics and visualization workloads. G6f instances support GPU partitions as small as one-eighth of a GPU with 3 GB of GPU memory, enabling finer-grained right-sizing and cost savings compared to single‑GPU options. Instances are offered in multiple sizes paired with third‑generation AMD EPYC processors, and are purchasable as On‑Demand, Spot, or via Savings Plans; customers should use NVIDIA GRID driver 18.4 or later to launch these instances.

read more →

Wed, November 12, 2025

Microsoft unveils Fairwater AI datacenter in Atlanta

🚀 Microsoft announced the new Fairwater Azure AI datacenter in Atlanta, Georgia, expanding its planet-scale AI superfactory. The purpose-built facility integrates massive NVIDIA Blackwell GPU clusters on a single flat network and uses rack-level direct liquid cooling plus a two-story layout to maximize compute density and reduce latency. It also connects via a dedicated AI WAN to enable cross-site fungibility and dynamic workload allocation.

read more →

Mon, November 10, 2025

Full-Stack Approach to Scaling RL for LLMs on GKE at Scale

🚀 Google Cloud describes a full-stack solution for running high-scale Reinforcement Learning (RL) with LLMs, combining custom TPU hardware, NVIDIA GPUs, and optimized software libraries. The approach addresses RL's hybrid demands—reducing sampler latency, easing memory contention across actor/critic/reward models, and accelerating weight copying—by co-designing hardware, storage (Managed Lustre, Cloud Storage), and orchestration on GKE. The blog emphasizes open-source contributions (vLLM, llm-d, MaxText, Tunix) and integrations with Ray and NeMo RL recipes to improve portability and developer productivity. It also highlights mega-scale orchestration and multi-cluster strategies to run production RL jobs at tens of thousands of nodes.

read more →

Mon, November 10, 2025

Amazon Braket Adds Native CUDA-Q Support in Notebooks

🔬 Amazon Braket notebook instances now include native support for CUDA-Q, enabled by upgrading the underlying OS to Amazon Linux 2023 to deliver improved performance, security, and compatibility for quantum development and production-ready workflows. Developers can run GPU-accelerated quantum circuit simulation alongside access to QPUs from IonQ, Rigetti, and IQM within the managed notebook environment. This eliminates the need for local deployment or separate Hybrid Jobs, streamlining hybrid quantum-classical experimentation. CUDA-Q support is available in all Regions where Braket operates.

read more →

Mon, November 10, 2025

New hardware attack (TEE.fail) breaks modern secure enclaves

🔒 A new low-cost hardware-assisted attack called TEE.fail undermines current trusted execution environments from major chipmakers. The method inserts a tiny device between a memory module and the motherboard and requires a compromised OS kernel to extract secrets, defeating protections in Confidential Compute, SEV-SNP, and TDX/SDX. The attack completes in roughly three minutes and works against DDR5 memory, meaning the physical-access threats TEEs are designed to defend against are no longer reliably mitigated.

read more →

Fri, November 7, 2025

Tiered KV Cache Boosts LLM Performance on GKE with HBM

🚀 LMCache implements a node-local, tiered KV Cache on GKE to extend the GPU HBM-backed Key-Value store into CPU RAM and local SSD, increasing effective cache capacity and hit ratio. In benchmarks using Llama-3.3-70B-Instruct on an A3 mega instance (8×nvidia-h100-mega-80gb), configurations that added RAM and SSD reduced Time-to-First-Token and materially increased token throughput for long system prompts. The results demonstrate a practical approach to scale context windows while balancing cost and latency on GKE.

read more →

Thu, November 6, 2025

Leading Bug Bounty Programs and Market Shifts 2025

🔒 Bug bounty programs remain a core component of security testing in 2025, drawing external researchers to identify flaws across web, mobile, AI, and critical infrastructure. Leading platforms like Bugcrowd, HackerOne, Synack and vendors such as Apple, Google, Microsoft and OpenAI have broadened scopes and increased payouts. Firms now reward full exploit chains and emphasize human-led reconnaissance over purely automated scanning. Programs also support regulatory compliance in critical sectors.

read more →