< ciso
brief />
Tag Banner

All news with #nvidia tag

86 articles · page 2 of 5

AWS Neuron SDK 2.29 Released with Stable NKI expanded tools

🚀 AWS released Neuron SDK 2.29.0, promoting the Neuron Kernel Interface (NKI) to Stable (v0.3.0) and adding a Standard Library plus a CPU Simulator for local kernel development. The update introduces ISA-level features, DMA priority controls, and variable-length collectives, along with seven new experimental kernels and improvements to existing ones. NxD Inference and the vLLM Neuron Plugin receive vision-language optimizations. Neuron Explorer moves to Stable and is available on the VS Code marketplace.
read more →

WPP Accelerates Humanoid Robot Training with G4 VMs

🤖 WPP leveraged Google Cloud G4 VM instances powered by the NVIDIA RTX PRO 6000 Blackwell and the NVIDIA Isaac Sim image to cut humanoid robot training from hours or days to under an hour, achieving more than 10x speedups. Their pipeline combines OptiTrack motion capture, OpenUSD digital twins, and MuJoCo-based validation to retarget complex human motion to constrained robot kinematics. Training ran at scale using GPU P2P topology and the AI Hypercomputer, condensing learned policies into ONNX for real-time deployment while preserving safety and robustness.
read more →

Amazon EC2 P6-B300 Instances Now in GovCloud (US-East)

🚀 Amazon has added EC2 P6-B300 instances to the AWS GovCloud (US‑East) Region. The p6-b300.48xlarge configuration provides 8x NVIDIA Blackwell Ultra GPUs with 2.1 TB of high-bandwidth GPU memory, 6.4 Tbps EFA networking, 300 Gbps ENA throughput, and 4 TB of system memory. P6-B300 delivers ~2x networking, 1.5x GPU memory and 1.5x FP4 TFLOPS vs P6-B200, targeting training and deployment of large trillion-parameter foundation models and LLMs.
read more →

Rowhammer Attacks Targeting GDDR6 GPUs and Servers

🔒 Three recent academic studies — GDDRHammer, GeForge, and GPUBreach — describe Rowhammer-style attacks that target GDDR6 on modern GPUs. The first two demonstrate memory-access patterns that can bypass TRR and corrupt GPU page tables, enabling arbitrary reads and writes in video memory and potential escalation into system RAM. GPUBreach goes further by chaining driver flaws to defeat IOMMU-based isolation. While enabling ECC, using HBM, and applying IOMMU mitigations reduce risk, these findings highlight a credible threat to shared GPU/cloud environments.
read more →

Nemotron-3-Super-120B and Qwen3.5 Models Added to SageMaker

🚀 Amazon SageMaker JumpStart now includes NVIDIA’s Nemotron-3-Super-120B and the Qwen3.5 family (9B and 27B), giving customers turnkey access to foundation models optimized for agentic reasoning, multilingual coding, and advanced instruction following. Nemotron-3-Super-120B employs a hybrid LatentMixture-of-Experts architecture with Mamba-2 and MoE layers to support collaborative agents and high-volume automation such as IT ticket triage and cybersecurity workflows. The Qwen3.5-9B prioritizes efficiency for resource-constrained environments, while Qwen3.5-27B offers deeper contextual and multimodal reasoning for large-scale document processing and complex scenarios. Users can deploy these models directly from the JumpStart catalog or programmatically via the SageMaker Python SDK.
read more →

Are $30,000 AI GPUs Better at Cracking Passwords Today?

🔒 Specops compared two flagship AI accelerators, the Nvidia H200 and AMD MI300X, against the consumer RTX 5090 using Hashcat benchmarks for MD5, NTLM, bcrypt, SHA-256 and SHA-512. The RTX 5090 outperformed both AI GPUs across all tested algorithms, often by wide margins, meaning the expensive AI hardware does not translate to superior password-cracking performance. Price-to-performance was stark: the H200 costs at least ten times an RTX 5090 yet delivers lower hash rates. The practical risk remains weak or reused credentials; long passphrases, breached-password detection, and MFA are the recommended mitigations.
read more →

Experimenting with GPUs, GKE DRANET and Inference Gateway

🔧 This post walks through deploying and serving a large model on Google Kubernetes Engine using managed DRANET and NVIDIA B200 GPUs. It explains how RDMA networking is provisioned as an isolated regional VPC for low-latency GPU-to-GPU communication and how to provision A4 nodes and reservations for RoCEv2-capable accelerators. The author provides example gcloud and kubectl commands to create the cluster, a GPU node pool with DRA labels, a ResourceClaimTemplate for mrdma workloads, and steps to serve a DeepSeek model privately via GKE Inference Gateway and a regional internal Application Load Balancer.
read more →

GPUBreach: GPU Rowhammer Enables Full System Compromise

🔒 Researchers at the University of Toronto demonstrated GPUBreach, a GPU-targeted Rowhammer technique that flips bits in GDDR6 to corrupt GPU page tables and subvert device memory controls. An unprivileged CUDA kernel can obtain arbitrary read/write access to GPU memory and then exploit NVIDIA driver flaws to escalate to CPU privileges and spawn a root shell. The work, due at IEEE S&P 2026, includes technical materials and shows impacts from key leakage to ML model manipulation.
read more →

GPUBreach: RowHammer on GPUs Enables Full Host Takeover

⚠️ New research describes GPUBreach, a set of GDDR6 RowHammer techniques that corrupt GPU page tables to gain arbitrary GPU memory read/write and, in GPUBreach's case, full host control. The work shows chained GDDR6 bit-flips can corrupt trusted driver state and trigger kernel memory-safety bugs in NVIDIA drivers even with the IOMMU enabled. Related efforts (GDDRHammer, GeForge) also achieve GPU-side arbitrary read/write, though some require IOMMU to be disabled. Enabling ECC reduces risk but is not a guaranteed mitigation for all platforms.
read more →

GPUBreach: GPU Rowhammer Enables System Takeover to Root

⚠️ A new attack called GPUBreach demonstrates that Rowhammer-induced bit flips in GDDR6 memory can corrupt GPU page tables and allow an unprivileged CUDA kernel to gain arbitrary GPU memory read/write access. The University of Toronto team showed this capability can be chained into CPU-side privilege escalation by exploiting memory-safety bugs in the NVIDIA driver, potentially yielding a full system compromise up to a root shell. Critically, the attack works with IOMMU enabled and remains unmitigated on consumer GPUs without ECC. Full technical details and a reproduction package will be published on April 13.
read more →

AI for Nuclear Energy: Building Intelligent Resilience

⚛️ Microsoft announces an AI for nuclear collaboration with NVIDIA to deliver an end-to-end, AI-powered foundation for nuclear project delivery. The initiative pairs Microsoft Azure, generative AI for permitting, and NVIDIA simulation and AI stacks to speed design, streamline licensing, and improve operations via Digital Twins. Early adopters — including Aalo Atomics, Southern Nuclear, and Idaho National Laboratory — report major time and cost reductions while preserving regulatory traceability and security.
read more →

Kubernetes as AI Infrastructure: llm-d Joins CNCF Sandbox

🚀 Google Cloud and partners announced that llm-d has been accepted into the CNCF Sandbox to promote open, accelerator-agnostic standards for distributed LLM inference. As a founding contributor alongside Red Hat, IBM Research, CoreWeave, and NVIDIA, Google emphasizes running any model on any accelerator in any cloud without vendor lock-in. GKE Inference Gateway now integrates the llm-d Endpoint Picker (EPP) to enable model-aware routing that optimizes for KV-cache hits, inflight requests, and queue depth, yielding concrete production gains in Vertex AI tests. Complementary work on the Kubernetes LeaderWorkerSet (LWS) API and vLLM extensions for Cloud TPUs targets scalable multi-node orchestration and up to 5x throughput improvements.
read more →

Training Frontier Models Efficiently on Ironwood TPUs

⚡ This technical guide explains how to extract peak training performance on Ironwood TPUs using the JAX and MaxText ecosystems. It highlights native FP8 support, Tokamax kernels for on-chip efficiency, offloading collectives to SparseCore processors, VMEM tuning, and sharding strategies (FSDP, TP, EP, CP, and hybrids). Practical flags and libraries such as Qwix and Tokamax are recommended for implementation.
read more →

AWS adds NIXL with EFA to accelerate LLM inference at scale

⚡ AWS now supports NVIDIA Inference Xfer Library (NIXL) with Elastic Fabric Adapter (EFA) on all EFA-enabled EC2 instances and regions. This integration accelerates disaggregated LLM inference by increasing KV-cache throughput, lowering inter-token latency, and optimizing KV-cache memory use between prefill and decode nodes. NIXL interoperates with frameworks such as NVIDIA Dynamo, SGLang, and vLLM. Supported versions are NIXL 1.0.0+ and EFA installer 1.47.0+, available at no extra cost.
read more →

Nvidia unveils NemoClaw to secure OpenClaw agents today

🔐 At the Nvidia GTC conference CEO Jensen Huang introduced NemoClaw, a secure runtime for running OpenClaw-style agents built on the Nvidia Agent Toolkit and the broader NeMo ecosystem. Central to the offering is the open-source OpenShell runtime, which provides kernel-level sandboxing and a “privacy router” to monitor and block unsafe communications. Nvidia says NemoClaw is hardware-agnostic though optimized for its own microservices, and aims to make edge agent deployment viable for enterprises while researchers inspect it for CVE-level flaws.
read more →

Microsoft, NVIDIA Expand Azure AI Infrastructure and Foundry

🚀 Microsoft and NVIDIA announced deeper integration at NVIDIA GTC, extending Microsoft Foundry to support NVIDIA Nemotron models and to simplify building production agents. New Azure AI infrastructure optimized for inference and reasoning will bring Vera Rubin NVL72 into liquid‑cooled datacenters and add initial support on Azure Local. Foundry Agent Service, Control Plane observability and a Voice Live API preview aim to accelerate prototype‑to‑production paths, while Fabric–Omniverse links and a public Physical AI Toolchain support simulation‑to‑operations workflows.
read more →

Check Point and NVIDIA Enable Secure AI Data Centers

🔒 Check Point has integrated with NVIDIA DSX Air’s cloud-based testing environment to let organizations pre-validate security-aware AI data center designs before deploying hardware. The capability enables large-scale simulation and end-to-end validation of AI Factory deployments across compute, networking, orchestration and security. By validating integrations, configurations and automation in advance, teams can reduce resource intensity and accelerate secure rollouts.
read more →

Google Cloud and NVIDIA Expand AI Hypercomputer Partnership

🚀 At NVIDIA GTC 2026, Google Cloud announced an expanded co‑engineering partnership with NVIDIA centered on the new Google Cloud AI Hypercomputer, designed to address the infrastructure demands of agentic and large-scale MoE workloads. The updates include momentum for G4 VMs powered by NVIDIA RTX Pro 6000 Server Edition, a preview of fractional G4 VMs using NVIDIA vGPU, and planned support for NVIDIA Vera Rubin NVL72 rack systems. Software integrations such as NVIDIA Dynamo with GKE Inference Gateway, Vertex AI Model Garden additions, and a public sector AI startup accelerator target lower latency, higher throughput, and more flexible consumption for inference and training.
read more →

Amazon EC2 G7e with NVIDIA Blackwell Now in Seoul, Spain

🚀 Starting today, Amazon Web Services has made EC2 G7e instances, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, available in Asia Pacific (Seoul) and Europe (Spain). These instances deliver up to 2.3x inference performance versus G6e and support up to eight GPUs (96 GB each), 192 vCPUs, and 1,600 Gbps of networking for demanding LLM, multimodal, and spatial computing workloads. G7e also supports NVIDIA GPUDirect P2P and GPUDirect RDMA with EFA for accelerated multi-GPU and multi-node performance and can be purchased as On-Demand, Spot, or via Savings Plans. Provisioning is available through the AWS Management Console, CLI, and SDKs.
read more →

Amazon EC2 G7e Instances Now Available in Tokyo Region

🚀 Amazon Web Services has launched EC2 G7e instances in the Asia Pacific (Tokyo) region, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. These instances deliver up to 2.3x inference performance compared to G6e, support up to eight GPUs with 96 GB per GPU, and provide up to 192 vCPUs and 1600 Gbps networking. They include NVIDIA GPUDirect P2P and GPUDirect RDMA with EFA in EC2 UltraClusters, and are available as On-Demand, Spot, or via Savings Plans.
read more →