Tag Banner

All news with #jax tag

Tue, November 11, 2025

Lightricks Scales Video Diffusion Training with JAX

🚀 Lightricks rewrote its training stack in JAX to scale high-performance video diffusion models on TPUs after hitting limits with PyTorch/XLA. The migration enabled reliable sharding, fixed FlashAttention and data-loading issues, and delivered linear scaling across small and large TPU pods. These improvements translated to ~40% more training steps per day, faster iteration, and doubled team productivity. Their stack leverages Flax, Optax, Orbax, and the MaxText blueprint for robust, testable, and efficient large-scale training.

read more →

Thu, November 6, 2025

Inside Ironwood: Google's Co‑Designed TPU AI Stack

🚀 The Ironwood TPU stack is a co‑designed hardware and software platform that scales from massive pre‑training to low‑latency inference. It combines dense MXU compute, ample HBM3E memory, and a high‑bandwidth ICI/OCS interconnect with compiler-driven optimizations in XLA and native support for JAX and PyTorch. Pallas and Mosaic enable hand‑tuned kernels for peak performance, while observability and orchestration tools address resilience and efficiency across pods and superpods.

read more →

Mon, November 3, 2025

Ray on TPUs with GKE: Native, Lower-Friction Integration

🚀 Google Cloud and Anyscale have enhanced the Ray experience on Cloud TPUs with GKE to reduce setup complexity and improve performance. The new ray.util.tpu library and a SlicePlacementGroup with a label_selector API automatically reserve co-located TPU slices and preserve SPMD topology to avoid resource fragmentation. Ray Train and Ray Serve gain expanded TPU support including alpha JAX training, while TPU metrics and libtpu logs appear in the Ray Dashboard for faster troubleshooting and migration between GPUs and TPUs.

read more →

Tue, September 23, 2025

Escalante Uses JAX on TPUs for AI-driven Protein Design

🧬 Escalante leverages JAX's functional, composable design to combine many predictive models into a single differentiable objective for protein engineering. By translating models (including AlphaFold and Boltz-2) into a JAX-native stack and composing them serially or linearly, they compute gradients with respect to input sequences and evolve candidates via optimization. Each job samples thousands of sequences, filters to roughly ten lab-ready designs, and runs at scale on Google Kubernetes Engine using spot TPU v6e, yielding a reported 3.65x performance-per-dollar advantage over H100 GPUs.

read more →