Unweight: Lossless BF16 Exponent Compression for LLMs
💾 Cloudflare's Unweight is a lossless compression system for LLM weights that reduces model size by roughly 15–22% while preserving bit-exact outputs and requiring no special hardware. It compresses only the exponent byte of BF16 tensors—using Huffman coding, palette/transcoding and row-level fallbacks—while leaving sign and mantissa untouched. Decompression happens into GPU shared memory to feed tensor cores directly, and Cloudflare has published a technical paper and open-sourced GPU kernels.
