top | item 46497490

Ask HN: How are you developing and testing agents without burning tokens?

1 points| zlatkov | 1 month ago

Every agent run (and especially when running tests in CI/CD) adds token cost, nondeterminism, and slows iteration.

I’m not talking about LLM evals here. I mean integration-style testing.

There are some workarounds like VCR-style record/replay (e.g., Python's VCR “cassettes”) and similar ideas for capturing LLM calls. They can work for certain frameworks when your test/runtime is the one making the HTTP requests.

Using local LLMs to save costs adds significant delay for each execution (+ doesn't solve nondeterminism).

However, it becomes significantly more challenging for MCP-style flows, where the interaction can be IDE-driven (stdio/JSON-RPC) and the HTTP calls may not even originate from your code.

I'm curious what people are using.

discuss

No comments yet.