(no title)
CapsAdmin | 1 day ago
Claude code always give me rate limits. Claude through copilot is a bit slow, but copilot has constant network request issues or something, but at least I don't get rate limited as often.
At least local models always work, is faster (50+ tps with qwen3.5 35b a4b on a 4090) and most importantly never hit a rate limit.
acchow|1 day ago
> 50+ tps with qwen3.5 35b a4b on a 4090
But qwen3.5 35b is worse than even Claude Haiku 4.5. You could switch your Claude Code to use Haiku and never hit rate limits. Also gets similar 50tps.
CapsAdmin|22 hours ago
My goto proprietary model in copilot for general tasks is gemini 3 flash which is priced the same as haiku.
The qwen model is in my experience close to gemini 3 flash, but gemini flash is still better.
Maybe it's somewhat related to what we're using them for. In my case I'm mostly using llms to code Lua. One case is a typed luajit language and the other is a 3d luajit framework written entirely in luajit.
I forgot exactly how many tps i get with qwen, but with glm 4.7 flash which is really good (to be local) gets me 120tps and a 120k context.
Don't get me wrong, proprietary models are superior, but local models are getting really good AND useful for a lot of real work.