All news with #jax tag
Tue, November 11, 2025
Lightricks Scales Video Diffusion Training with JAX
🚀 Lightricks rewrote its training stack in JAX to scale high-performance video diffusion models on TPUs after hitting limits with PyTorch/XLA. The migration enabled reliable sharding, fixed FlashAttention and data-loading issues, and delivered linear scaling across small and large TPU pods. These improvements translated to ~40% more training steps per day, faster iteration, and doubled team productivity. Their stack leverages Flax, Optax, Orbax, and the MaxText blueprint for robust, testable, and efficient large-scale training.
Thu, November 6, 2025
Inside Ironwood: Google's Co‑Designed TPU AI Stack
🚀 The Ironwood TPU stack is a co‑designed hardware and software platform that scales from massive pre‑training to low‑latency inference. It combines dense MXU compute, ample HBM3E memory, and a high‑bandwidth ICI/OCS interconnect with compiler-driven optimizations in XLA and native support for JAX and PyTorch. Pallas and Mosaic enable hand‑tuned kernels for peak performance, while observability and orchestration tools address resilience and efficiency across pods and superpods.
Mon, November 3, 2025
Ray on TPUs with GKE: Native, Lower-Friction Integration
🚀 Google Cloud and Anyscale have enhanced the Ray experience on Cloud TPUs with GKE to reduce setup complexity and improve performance. The new ray.util.tpu library and a SlicePlacementGroup with a label_selector API automatically reserve co-located TPU slices and preserve SPMD topology to avoid resource fragmentation. Ray Train and Ray Serve gain expanded TPU support including alpha JAX training, while TPU metrics and libtpu logs appear in the Ray Dashboard for faster troubleshooting and migration between GPUs and TPUs.
Tue, September 23, 2025
Escalante Uses JAX on TPUs for AI-driven Protein Design
🧬 Escalante leverages JAX's functional, composable design to combine many predictive models into a single differentiable objective for protein engineering. By translating models (including AlphaFold and Boltz-2) into a JAX-native stack and composing them serially or linearly, they compute gradients with respect to input sequences and evolve candidates via optimization. Each job samples thousands of sequences, filters to roughly ten lab-ready designs, and runs at scale on Google Kubernetes Engine using spot TPU v6e, yielding a reported 3.65x performance-per-dollar advantage over H100 GPUs.