The multi-agent budget problem you're describing gets even harder when the
services are heterogeneous. In a RAG pipeline, a single user query might hit:
query analysis (LLM call), embedding generation (different model/pricing),
reranking (yet another model), and response generation (LLM call) — each
potentially in a different process.
Per-call monkey-patching sees each call in isolation. What I ended up doing was
a trace-based approach: every request gets a trace ID, each service appends cost
spans asynchronously, and a separate enrichment step aggregates the total. The
hard part was deduplication — when service A reports an aggregate cost and
service B reports the individual calls that compose it, you need to reconcile
or you double-count.
Your atomic disk writes for halt state is a nice pattern. I went with
fire-and-forget (never block the request path, accept eventual consistency on
cost data) but that means you can't do hard enforcement mid-request like
AgentBudget does.
The deduplication problem is the part I haven't worked out cleanly. The hierarchy in veronica-core sidesteps it as long as you declare parent-child relationships upfront — B's spend rolls directly into A's ceiling without a separate aggregation step. But in a dynamic pipeline where you don't know the call graph until runtime, that assumption breaks.
The fire-and-forget tradeoff makes sense. I went with blocking enforcement because the original use case was preventing runaway agents, not auditing after the fact. For RAG you're probably right that eventual consistency is the better fit — you care more about the trace than cutting off a half-finished response.
das-bikash-dev|5 days ago
Per-call monkey-patching sees each call in isolation. What I ended up doing was a trace-based approach: every request gets a trace ID, each service appends cost spans asynchronously, and a separate enrichment step aggregates the total. The hard part was deduplication — when service A reports an aggregate cost and service B reports the individual calls that compose it, you need to reconcile or you double-count.
Your atomic disk writes for halt state is a nice pattern. I went with fire-and-forget (never block the request path, accept eventual consistency on cost data) but that means you can't do hard enforcement mid-request like AgentBudget does.
tenpa0000|5 days ago