Fireworks AI is a production-grade inference platform focused on serving open-source models (Llama, Mistral, Qwen, Stable Diffusion, and more) with very low latency. It offers serverless and dedicated GPU deployments, compound AI system support, and an OpenAI-compatible API — making it popular with developers who need reliable, fast open-source inference.
Llama-3.1-8B: ~$0.20/1M tokens. Llama-3.1-70B: ~$0.90/1M. See fireworks.ai for current rates.