Groq is a cloud inference provider that runs popular open-source models (Llama, Mixtral, Gemma, Whisper and more) on its custom LPU (Language Processing Unit) hardware. The result is inference speeds often 10–100× faster than GPU-based providers, making it ideal for latency-sensitive applications.
Generous free tier. Paid plans from ~$0.05/1M tokens depending on model. See groq.com.