top | item 47087398

Ask HN: How do you prevent retry cascades in LLM systems?

2 points| amabito | 10 days ago

We ran into a retry amplification issue in one of our LLM agents recently.

The provider returned 429s for a short period. We had per-call retry limits in place. We did NOT have containment at the request-chain level.

Because calls were nested and spread across multiple workers, retries multiplied in ways we didn’t anticipate.

Per-call limits were not enough.

For those running LLM systems in production:

– Do you implement chain-level retry budgets? – Shared circuit breaker state? – Per-minute cost ceilings? – Cost-based limits (tokens/$) rather than retry count? – Or is exponential backoff usually sufficient in practice?

I’m trying to understand what actually works at scale, beyond monitoring dashboards that tell you after the fact.

discuss

No comments yet.