How do you handle lost webhooks in production?
14 points| everydaydev | 3 months ago
Every team ended up building the same solution: retry logic, dead letter queue, monitoring.
Curious how others handle this: - Do you rely on the provider's retry policy? - Built your own reliability layer? - Use a service? - Just manually reconcile when it happens?
(Context: Building https://relaehook.com to solve this, but genuinely curious what the norm is)
renewiltord|3 months ago
Trivial Go program, day’s work. Stick it in Postgres, run continuously.
Bizarrely there are vendors who are weird about webhooks. Lifefile, as an example, charges pharmacies a dollar per webhook firing. So the pharmacies are crappy about retry policy.
Tbh I wouldn’t buy any product in this space. It’s too simple with exclusive HTTP server plus Postgres plus processing loop. And with already delicate thing I would rather not introduce more vendors.
No, not even if you converted it into event queue via websocket or zmq or what have you.
everydaydev|3 months ago
Relae exists for teams who’d rather outsource that operational surface, similar to why people use managed queues instead of running their own RabbitMQ. Not everyone needs it — but some prefer not to own that part of the stack.
super256|3 months ago
everydaydev|3 months ago
Where things get messy is when you have a mix of providers with wildly different retry behaviors, or internal services that have their own rate limits or downtime windows. A relay layer keeps the intake consistent even when the rest of the system isn’t.
samarthr1|3 months ago
Plus trusts y'all with contents of said webhook?
everydaydev|3 months ago
And on the data side, we don’t use webhook payloads for anything other than delivery. They’re encrypted at rest, transit, and automatically purged based on retention settings.
nickphx|3 months ago
everydaydev|3 months ago
journal|3 months ago
phillipseamore|3 months ago
everydaydev|3 months ago