Trustpilot’s real-time data enrichment with Gemma
🧩Trustpilot built a high-volume streaming pipeline using fine-tuned Gemma models to process millions of user reviews in near real-time under tight latency and cost constraints. The team replaced variable per-token pricing with fixed infrastructure costs, fine-tuned lightweight models for tasks like NER, sentiment, and topic classification, and separated classifier and LLM endpoints. Performance tuning, vLLM optimizations, and load testing enabled scalable inference despite challenges with private networking, deployment observability, and GPU availability.
