(no title)
rbitar
|
5 months ago
Congrats to the team, I'm surprised the industry hasn't been as impressed with their benchmarks on token throughput. We're using the Qwen 3 Coder 480b model and seeing ~2000 tokens/second, which is easily 10-20x faster then most LLM models on the market. Even some of the fastest models still only achieve 100-150 tokens / second (see OpenRouter stats by provider). I do feel after around 300-400 tokens/second the gains in speed feel more incremental, so if there was a model at 300+ tokens/second, I would consider that a very competitive alternative.
No comments yet.