Ask HN: How do you handle duplicate side effects when jobs, workflows retry?
10 points| shineDaPoker | 2 days ago
1. Job calls external API (Stripe, SendGrid, AWS) 2. API call succeeds 3. Job crashes before recording success 4. Job retries → calls API again → duplicate
Example: process refund, send email notification, crash. Retry does both again. Customer gets duplicate refund email (or worse, duplicate refund).
I see a few approaches:
Option A: Store processed IDs in database Problem: Race between "check DB" and "call API" can still duplicate
Option B: Use API idempotency keys (Stripe supports this) Problem: Not all APIs support it (legacy systems, third-party)
Option C: Build deduplication layer that checks external system first Problem: Extra latency, extra complexity
What do you do in production? Accept some duplicates? Only use APIs with idempotency? Something else?
(I built something for Option C, but trying to understand if this is actually a common-enough problem or if I'm over-engineering.)
jnbridge|1 day ago
A few patterns that have worked well in practice:
1. Idempotency keys at the API boundary — every side-effecting call gets a client-generated UUID, and the receiver deduplicates. Simple, but think carefully about the TTL of your dedup window.
2. Outbox pattern — instead of directly calling the external service, write the intent to a local "outbox" table in the same transaction as your state change. A separate process polls the outbox and delivers. Debezium + CDC makes this quite clean.
3. For cross-system workflows: treat the saga orchestrator as the single source of truth for step completion. Each step checks its completion status before executing, so steps must be idempotent OR the orchestrator tracks state.
In practice, designing for at-least-once delivery + idempotent receivers is more reliable than trying to achieve exactly-once through distributed coordination. Exactly-once across system boundaries is effectively a myth outside of systems that support two-phase commit (and even then it's fragile).
dakiol|2 hours ago
fernando_campos|13 hours ago
What helped us was treating every job execution as replayable and attaching a unique operation key instead of relying on execution state alone.
Otherwise retries silently create data inconsistencies that only appear much later.
moomoo11|2 days ago
shineDaPoker|2 days ago
The atomic records part is critical - I learned the hard way that just checking a DB flag isn't enough (process can freeze between check and execute, lease expires, another process takes over, both execute).
How do you handle the case where: 1. Process acquires atomic lock 2. Calls external API successfully 3. Process freezes before releasing lock 4. Lock expires, new process acquires it 5. New process calls API again → duplicate
Do you just accept this edge case (rare but possible)? Or is there a mitigation I'm missing?
codebitdaily|2 days ago
stephenr|2 days ago
- If the external service supports idempotent operations, use that option.
- If the external service doesn't, but has a "retrieval" feature (i.e. lookup if the thing already exists, e.g fetch refunds on a given payment), use that first.
- If the system has neither, assess how critical it is to avoid duplicates.
shineDaPoker|2 days ago
For APIs that support idempotency keys (Stripe, etc.), I use those. For ones that don't but have retrieval (most do), I check first before retrying.
The question I'm wrestling with: is the extra round-trip for the lookup worth it? Or should I just accept the edge cases where it duplicates?
What's your threshold for "critical enough to avoid duplicates"? Payments obviously yes, but what about notifications, reporting, analytics events?
babelfish|2 days ago
shineDaPoker|2 days ago
His advice was: Temporal solves orchestration, but making the external API calls idempotent is on you. For simple cases, write observe activities manually. For complex cases, build abstraction.
That's what led me down this path - trying to figure out if the abstraction is worth building or if manual is good enough.
Have you used Temporal for this? How do you handle the idempotency of external calls?