top | item 46781055

Sync vs. async vs. event-driven AI requests: what works in production

4 points| akarshc | 1 month ago |modelriver.com

10 comments

order

akarshc|1 month ago

I’m one of the builders. Once AI requests moved beyond simple sync calls, we kept running into the same problems in production: retries hiding failures, async flows that were hard to reason about, frontend state drifting, and providers timing out mid-request.

This page breaks down the three request patterns we see teams actually using in production (sync, async, and event-driven async), how data flows in each case, and why we ended up favoring an event-driven approach for interactive, streaming apps.

Happy to answer questions or go deeper on any part of the architecture.

amalv|1 month ago

If a team adopts this pattern and later decides to remove ModelRiver, how hard is it to unwind? Are the request and event models close to provider APIs or fairly opinionated?

akarshc|1 month ago

This was something we were careful about. The request and event models are intentionally close to what most providers already expose, rather than introducing a completely new abstraction.

Teams usually integrate it incrementally in front of existing calls. If you remove it, you’re mostly deleting the orchestration layer and keeping your provider integrations and client logic. You lose centralized retries and observability, but you’re not stuck rewriting your entire request model.

If adopting it requires a full rewrite, that’s usually a sign it’s being applied too broadly.

GopikaDilip|1 month ago

How do you reason about retries and correctness once a stream has already started? For example, how do you avoid duplicated or missing tokens if a provider fails mid-stream?

akarshc|1 month ago

This is one of the harder problems, and there isn’t a perfect answer.

The main thing we try to avoid is pretending mid-stream retries are the same as pre-request retries. Once a stream has started, we treat it as a sequence of events with checkpoints rather than a single opaque response. Retries are scoped to known safe boundaries, and anything ambiguous is surfaced explicitly instead of silently re-emitting tokens.

In other words, correctness is prioritized over pretending the stream is seamless. If we can’t guarantee no duplication, we make that visible rather than hide it.

aparnavalsan43|1 month ago

In practice, where does the event-driven approach break down? What kinds of workloads still fit better with simple sync or queue-based async?

vishaal_007|1 month ago

In practice, event-driven starts to feel like overkill when requests are short-lived and failures are cheap. If a call is fast, idempotent, and the user isn’t waiting on partial output, a simple sync request is usually the clearest solution.

Queue-based async still works well for batch jobs, offline processing, or anything where latency and ordering aren’t user-visible. The event-driven approach mainly pays off once you have long-lived or interactive requests where failures can happen mid-response and you care about what the user actually sees.

vishaal_007|1 month ago

I’m another founder on this. One thing that surprised us while building AI features was how often the hard problems weren’t about model choice, but about request lifecycle. Once you introduce streaming, retries, and multiple providers, a lot of implicit assumptions in typical request–response code stop holding.

We kept seeing teams reinvent similar patterns in slightly different ways, especially around correlating events, handling partial failures, and keeping the frontend in sync with what actually happened on the backend. The goal with this writeup was to make those tradeoffs explicit and show what’s actually happening on the wire in each approach.

Curious to hear how others here are handling long-lived or streaming AI requests in production, especially once things start failing in non-obvious ways.