top | item 44770946

Ask HN: Tips for reducing LLM token usage?

1 points| vmt-man | 7 months ago

I've been using Claude Code with Serena MCP, but for the past few weeks it's been compressing the context more often. I have two Pro accounts, and it's still not enough for my daily needs anymore :(

Also, Claude Code tends to make very broad search requests, and I keep getting an error from MCP about exceeding 25,000 characters. It happens quite often.

What would you recommend?

6 comments

bigyabai|7 months ago

> What would you recommend?

Invest in a local inference server and run Qwen3. At this point it will still cost less than two pro accounts.

brulard|6 months ago

Don't do that. You'll spend much of your time tinkering with HW/sw instead of doing what you care for. I recently upgraded to Claude Max ($100 version). It's not cheap, but it would pay for itself. On top of that this local setup that is recommended here will be slower, dumber and would cost you right away many hundreds of bucks. And models and tools are improving quickly. I don't want to imagine how much time you would spend upgrading these local models yourself. If you just run Claude, it is taken care of, Claude Code is the best agentic tool there is and is improving every few weeks.

vmt-man|7 months ago

What hardware do you suggest? :)