top | item 45433192

(no title)

rbitar | 5 months ago

Congrats to the team, I'm surprised the industry hasn't been as impressed with their benchmarks on token throughput. We're using the Qwen 3 Coder 480b model and seeing ~2000 tokens/second, which is easily 10-20x faster then most LLM models on the market. Even some of the fastest models still only achieve 100-150 tokens / second (see OpenRouter stats by provider). I do feel after around 300-400 tokens/second the gains in speed feel more incremental, so if there was a model at 300+ tokens/second, I would consider that a very competitive alternative.

discuss

No comments yet.