All news with #lora tag

Tue, December 2, 2025

Practical Guide to GPU HBM for Fine-Tuning Models in Cloud

#LoRA #QLoRA #FlashAttention #Quantization #FSDP #NVIDIA #Google

🔍 Running into CUDA out-of-memory errors is a common blocker when fine-tuning models; High Bandwidth Memory (HBM) holds model weights, optimizer state, gradients, activations, and framework overhead. The article breaks down those consumers, provides a simple HBM sizing formula, and walks through a 4B-parameter bfloat16 example that illustrates why full fine-tuning can require tens of GBs. It then presents practical mitigations—PEFT with LoRA, quantization and QLoRA, FlashAttention, and multi‑GPU approaches including data/model parallelism and FSDP—plus a sizing guide (16–40+ GB) to help choose the right hardware.

Tue, November 18, 2025

Fine-tuning MedGemma for Breast Tumor Classification

#Google #Open-Weight Models #MedGemma #LoRA #bfloat16

🧬 This guide demonstrates step-by-step fine-tuning of MedGemma (a Gemma 3 variant) to classify breast histopathology images using the public BreakHis dataset and a notebook-based workflow. It highlights practical choices—using an NVIDIA A100 40 GB, switching from FP16 to BF16 to avoid numerical overflows, and employing LoRA adapters for efficient training. The tutorial reports dramatic accuracy gains after merging LoRA adapters and points readers to runnable notebooks for reproducibility.