top | item 47204547

(no title)

I pay for copilot to access anthropic, google and openai models.

Claude code always give me rate limits. Claude through copilot is a bit slow, but copilot has constant network request issues or something, but at least I don't get rate limited as often.

At least local models always work, is faster (50+ tps with qwen3.5 35b a4b on a 4090) and most importantly never hit a rate limit.

discuss

acchow|1 day ago

> Claude code always give me rate limits

> 50+ tps with qwen3.5 35b a4b on a 4090

But qwen3.5 35b is worse than even Claude Haiku 4.5. You could switch your Claude Code to use Haiku and never hit rate limits. Also gets similar 50tps.

CapsAdmin|22 hours ago

I haven't tried 4.5 haiku much, but i was not impressed with previous haiku versions.

My goto proprietary model in copilot for general tasks is gemini 3 flash which is priced the same as haiku.

The qwen model is in my experience close to gemini 3 flash, but gemini flash is still better.

Maybe it's somewhat related to what we're using them for. In my case I'm mostly using llms to code Lua. One case is a typed luajit language and the other is a 3d luajit framework written entirely in luajit.

I forgot exactly how many tps i get with qwen, but with glm 4.7 flash which is really good (to be local) gets me 120tps and a 120k context.

Don't get me wrong, proprietary models are superior, but local models are getting really good AND useful for a lot of real work.