top | item 46992768

(no title)

kachapopopow | 17 days ago

well yah, that's what I mean how better is it versus cat + grep + manual line counting. Agents tend to perform worse with niche tools

discuss

order

jahala|15 days ago

It was really helpful to make and run a benchmark - it led to some important changes and improvements, so thanks again for your question kp!

The result is ~17% reduction in raw cost. If calculated per correct answer, its ~25% reduction per correct answer.

Just posted the update -> https://news.ycombinator.com/item?id=47016959

jahala|17 days ago

Thank you for this question - I'm building out a benchmark now. Initial results are very promising, will update you once it's done!