(no title)
acchow | 21 hours ago
> 50+ tps with qwen3.5 35b a4b on a 4090
But qwen3.5 35b is worse than even Claude Haiku 4.5. You could switch your Claude Code to use Haiku and never hit rate limits. Also gets similar 50tps.
acchow | 21 hours ago
> 50+ tps with qwen3.5 35b a4b on a 4090
But qwen3.5 35b is worse than even Claude Haiku 4.5. You could switch your Claude Code to use Haiku and never hit rate limits. Also gets similar 50tps.
CapsAdmin|18 hours ago
My goto proprietary model in copilot for general tasks is gemini 3 flash which is priced the same as haiku.
The qwen model is in my experience close to gemini 3 flash, but gemini flash is still better.
Maybe it's somewhat related to what we're using them for. In my case I'm mostly using llms to code Lua. One case is a typed luajit language and the other is a 3d luajit framework written entirely in luajit.
I forgot exactly how many tps i get with qwen, but with glm 4.7 flash which is really good (to be local) gets me 120tps and a 120k context.
Don't get me wrong, proprietary models are superior, but local models are getting really good AND useful for a lot of real work.