Agree backoff+jitter is table stakes, and load shedding/backpressure is necessary under sustained overload. The tricky cases I’m digging into are shared rate limits (429s) and many concurrent clients/agents where local backoff isn’t coordinated and you still get herds after partial outages. Curious what patterns you’ve seen work well for coordinating retries/fairness across tenants or API keys?
No comments yet.