(no title)
frontsideair | 5 months ago
It's a balancing game, how slow a token generation speed can you tolerate? Would you rather get an answer quick, or wait for a few seconds (or sometimes minutes) for reasoning?
For quick answers, Gemma 3 12B is still good. GPT-OSS 20B is pretty quick when reasoning is set to low, which usually doesn't think longer than one sentence. I haven't gotten much use out of Qwen3 4B Thinking (2507) but at least it's fast while reasoning.
No comments yet.