Tell HN: I cut Claude API costs from $70/month to pennies
40 points| ok_orco | 1 month ago
I'd done napkin math beforehand, so I knew it was probably a bug, but still. Turns out it was only partially a bug. The rest was me needing to rethink how I built this thing. Spent the next couple days ripping it apart. Making tweaks, testing with live data, checking results, trying again. What I found was I was sending API requests too often and not optimizing what I was sending and receiving.
Here's what moved the needle, roughly big to small (besides that bug that was costin me a buck a day alone):
- Dropped Claude Sonnet entirely - tested both models on the same data, Haiku actually performed better at a third of the cost
- Started batching everything - hourly calls were a money fire
- Filter before the AI - "lol" and "thanks" are a lot of online chatter. I was paying AI to tell me that's not feedback. That said, I still process agreements like "+1" and "me too."
- Shorter outputs - "H/M/L" instead of "high/medium/low", 40-char title recommendation
- Strip code snippets before processing - just reiterating the issue and bloating the call
End of the week: pennies a day. Same quality.
I'm not building a VC-backed app that can run at a loss for years. I'm unemployed, trying to build something that might also pay rent. The math has to work from day one.
The upside: these savings let me 3x my pricing tier limits and add intermittent quality checks. Headroom I wouldn't have had otherwise.
Happy to answer questions.
LTL_FTC|1 month ago
kreetx|1 month ago
queenkjuul|1 month ago
ok_orco|1 month ago
ydu1a2fovb|1 month ago
44za12|1 month ago
https://github.com/NehmeAILabs/llm-sanity-checks
homeonthemtn|1 month ago
andai|29 days ago
>Most tasks don't. This repo helps you figure out which ones.
About a year ago I was testing Gemini 2.5 Pro and Gemini 2.5 Flash for agentic coding. I found they could both do the same task, but Gemini Pro was way slower and more expensive.
This blew my mind because I'd previously been obsessed with "best/smartest model", and suddenly realized what I actually wanted was "fastest/dumbest/cheapest model that can handle my task!"
gandalfar|1 month ago
andai|29 days ago
I haven't tested it extensively but I found that when I used Claude Code with it, it was reasonably fast (but actual Claude was way faster), but when I tried to use the API itself manually, it would be super slow.
My guess would be think they're filtering the traffic and prioritizing certain types. On my own script, I ran into a rate limit after 7 requests!
DANmode|1 month ago
tehlike|1 month ago
viraptor|1 month ago
ok_orco|1 month ago
deepsummer|1 month ago
DeathArrow|1 month ago
joshribakoff|1 month ago
toxic72|1 month ago
arthurcolle|1 month ago
ok_orco|1 month ago
Most of the cost savings came from not sending stuff to the LLM that didn't need to go there, plus the batch API is half the price of real-time calls.
dezgeg|1 month ago
ok_orco|1 month ago