Which the author admits three quarters of the way through:
> The way we achieve exactly-once delivery in practice is by faking it. Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects, or we remove the need for idempotency through deduplication.
Honestly I don't get why this is "faking it" though. It seems like the author's definition of "exactly once" is so purist as to essentially be a strawman. This is "exactly once" in practice.
Like are there other people claiming that this purist version of exactly-once does exist?
> Like are there other people claiming that this purist version of exactly-once does exist?
In my experience, the purist version of "exactly-once" exists as a vague, wishy-washy mental model in the brains of developers who have never thought hard about this stuff[0]. Like, once you sketch out why idempotency is important and how to do it, folks seem to pick up on it pretty quickly, but not everyone has trained their intuition to where they automatically notice these sorts of failure modes.
[0] I don't mean this as a slight against those developers--the issues that arise from distributed systems are both myriad and subtle, and if you've spent your time learning how to make beautiful web pages or cool video games or efficient embedded systems, it seems reasonable to not know anything about the accursed problems of hypothetical Byzantine Generals. Or maybe you're fresh out of a bootcamp or an undergraduate program and haven't yet been trained to expect computers to always and constantly fail in every possible way.
Because both of this "solutions" are not part of the delivery mechanism but part of your problem space. So the delivery system is not guaranteeing even a fake exactly-once delivery, it's you usage that makes it a fake exactly once.
What's more both of these solutions are very hard in practice. Idempontency can be applied only on special circumstances when you can design it that way. "Prepare an order" message for example can't be idempotent, it has side effects and it will prepare a new order every time you recieve the message, so you go the deduplication Route by considering the OrderID but if you have several Workers that process these messages how do you handle DeDuplication? if the first worker has never Ack-ed the processing, do you deliver it to a new Worker in the queue? How does the new Worker know if someone else is processing the same OrderID? Central Database? you are only hitting the can down the road...
It can get way harder when your initial design made incorrect assumptions about the delivery semantics you were using, so you didn't know you'd need it.
Edit for example:
Someone could have a low-latency problem that seems like it could be a fit for a streaming application. They could look at docs and see "ooh, with Flink I can do exactly-once writes to Kafka" in one place, and choose to use that. But if they don't dig deeply into what that means, they may miss the latency impacts of having to checkpoint every time to commit a set of writes to Kafka. And by the time they figure this out, managing both "low latency" and "exactly once" in the code they wrote might be a really hairy problem.
The distinction is how you design. You don't need idempotence with a mythical "exactly once" system. Conversely, when you're debugging a system built on top of "at least once", you need to keep that property in mind in case the bug you're tracking down is lost idempotence.
Because idempotence can be very hard to achieve. You usually can't just write the message ID to a DB and ignore messages with a matching ID because if you crash while processing then you need to start over again. But you can't just write it at the end because then all of your processing steps need to be idempotent (so why are you bothering to write the ID?).
I've seen very few systems that have general idempotency baked in. Often it ends up being specific to the application. In some cases you can have simple solutions like upon crashing reload all of the state from an authoritative source. In some cases your messages result in simple idempotent operations such as "insert message with a unique ID" or "mark a message with a unique ID as read" but even then these are becoming quite related to business logic.
Basically idempotency is a powerful tool to create a solution but it is no silver bullet. That is why it is important to understand the underlying problem.
I think we need to keep the concepts separate because otherwise people get confused. You can not receive a message exactly once. Yes, it's not that hard, if you know this is an issue, to build a system where receiving the same message more than once won't cause a bad thing to happen. There's a few principled ways to do this, and some less principled ways that will still mostly work.
But that's not because you built a system that successfully delivers messages exactly once... you build a system that successfully processes messages exactly once, even if delivery occurs multiple times. The delivery still occurred multiple times. Even if your processing layer handled it, that may have other consequences worth understanding. Wrapping that up in a library may present a nice API for some programmer, but it doesn't solve the Byzantine General problem.
Whenever someone insists they can build Exactly Once with [mumble mumble mumble great tech here] I guarantee you there's a non-empty set of human readers coming away with the idea they can successfully create systems based on exactly-once delivery. After all, I built some code based on exactly-once delivery last night and it's working fine on my home ethernet even after I push billions of messages through it.
We're really better of pushing "There is no such thing as Exactly Once, and the way you deal with is [idempotence/id tracking/whatever]", not "Yes there is such a thing as Exactly Once delivery (see fine print about how I'm redefining this term)". The former produces more accurate models in human brains about what is going on and is more likely to be understood as a set of engineering tradeoffs. The latter seems to produce a lot of confusion and people not understanding that their "Exactly Once" solution isn't a magic total solution to the problem, but is in fact a particular point on the engineering tradeoff spectrum. In particular, the "exactly once" solutions can be the wrong choice for certain problems, like multiplayer game state updates, where it may be a lot more viable to think 1-or-0 and some timestamping and the ability to miss messages entirely and recover, rather than building an "exactly once" system.
> But that's not because you built a system that successfully delivers messages exactly once... you build a system that successfully processes messages exactly once, even if delivery occurs multiple times.
I think the difference might be partly semantic. If processing at the messaging level is idempotent + at least once, then message delivery to the application level is exactly once. People mostly only care about the application level not the lower levels where they might just build on a library or system that handles that logic for them.
AFAIK the point of exactly once delivery, in the context of message passing, is to abstract delivery concerns away from the application layer and into the messaging layer, so that the application can depend on the exactly-once semantics without having to write logic for it.
The problem with this is similar to the problems with two-phase commit in distributed databases: there are unavoidable failure cases. Most of the time it works just fine, but if you write your application to depend on this impossible feature, and it fails - which, given enough time, will certainly happen - then the cleaning up the mess can be much more effort (and have much wider business implications) than simply dealing with the undesirable behaviour of reality in the first place.
Or to put it another way: exactly once semantics can never be reliably extracted away from the application, so if you need it, it needs to be part of your application.
Theoretically true, and easy to say. But the hard part is actually implementing this in the context of business problems. What if you need to call external services that you don't control, and they don't provide idempotence? Like sending emails. Or worse: you send a message to a warehouse to deliver an item, and they deliver duplicates...
Yeah the duplicate email thing is a classic problem, but I’m not sure it’s one of “idempotence”. This can happen in any (intended to be) transactional operation that creates a side affect.
Hit an error, roll-back, side-affect can’t be rolled back. Retry - side-affect happens again.
Wouldn’t the general approach be to have unique message identifiers and queue side-affects? Maybe I’m missing lots of subtleties.
If you guarantee "exactly once", you design your systems differently than "at least one with idempotence". A system designed for exactly once will be less complicated than a system designed for at least once + idempotence, which is why it is ideal but impossible.
With idempotence, you shift the problem from "deliver X exactly once" to "make it seem like X was delivered exactly once". In most systems, exactly-once is really "effectively exactly once".
It can be exactly once at the application level just not exactly once at the more fine-grained message level. The fact that it's not exactly once at that lower level doesn't really matter, the semantics at the application level is what we care about.
Not exactly. If you have a business problem where you’re thinking “But I really, really need the effect of exactly-once; what can I do?”, GP’s post has the answer.
crazygringo|3 years ago
> The way we achieve exactly-once delivery in practice is by faking it. Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects, or we remove the need for idempotency through deduplication.
Honestly I don't get why this is "faking it" though. It seems like the author's definition of "exactly once" is so purist as to essentially be a strawman. This is "exactly once" in practice.
Like are there other people claiming that this purist version of exactly-once does exist?
nimih|3 years ago
In my experience, the purist version of "exactly-once" exists as a vague, wishy-washy mental model in the brains of developers who have never thought hard about this stuff[0]. Like, once you sketch out why idempotency is important and how to do it, folks seem to pick up on it pretty quickly, but not everyone has trained their intuition to where they automatically notice these sorts of failure modes.
[0] I don't mean this as a slight against those developers--the issues that arise from distributed systems are both myriad and subtle, and if you've spent your time learning how to make beautiful web pages or cool video games or efficient embedded systems, it seems reasonable to not know anything about the accursed problems of hypothetical Byzantine Generals. Or maybe you're fresh out of a bootcamp or an undergraduate program and haven't yet been trained to expect computers to always and constantly fail in every possible way.
cowl|3 years ago
majormajor|3 years ago
It can get way harder when your initial design made incorrect assumptions about the delivery semantics you were using, so you didn't know you'd need it.
Edit for example:
Someone could have a low-latency problem that seems like it could be a fit for a streaming application. They could look at docs and see "ooh, with Flink I can do exactly-once writes to Kafka" in one place, and choose to use that. But if they don't dig deeply into what that means, they may miss the latency impacts of having to checkpoint every time to commit a set of writes to Kafka. And by the time they figure this out, managing both "low latency" and "exactly once" in the code they wrote might be a really hairy problem.
hn_go_brrrrr|3 years ago
kevincox|3 years ago
I've seen very few systems that have general idempotency baked in. Often it ends up being specific to the application. In some cases you can have simple solutions like upon crashing reload all of the state from an authoritative source. In some cases your messages result in simple idempotent operations such as "insert message with a unique ID" or "mark a message with a unique ID as read" but even then these are becoming quite related to business logic.
Basically idempotency is a powerful tool to create a solution but it is no silver bullet. That is why it is important to understand the underlying problem.
pksebben|3 years ago
1. buy plane ticket 2. bring box to recipient 3. plug in Ethernet & send message
keep an eye out for our IPO
jerf|3 years ago
But that's not because you built a system that successfully delivers messages exactly once... you build a system that successfully processes messages exactly once, even if delivery occurs multiple times. The delivery still occurred multiple times. Even if your processing layer handled it, that may have other consequences worth understanding. Wrapping that up in a library may present a nice API for some programmer, but it doesn't solve the Byzantine General problem.
Whenever someone insists they can build Exactly Once with [mumble mumble mumble great tech here] I guarantee you there's a non-empty set of human readers coming away with the idea they can successfully create systems based on exactly-once delivery. After all, I built some code based on exactly-once delivery last night and it's working fine on my home ethernet even after I push billions of messages through it.
We're really better of pushing "There is no such thing as Exactly Once, and the way you deal with is [idempotence/id tracking/whatever]", not "Yes there is such a thing as Exactly Once delivery (see fine print about how I'm redefining this term)". The former produces more accurate models in human brains about what is going on and is more likely to be understood as a set of engineering tradeoffs. The latter seems to produce a lot of confusion and people not understanding that their "Exactly Once" solution isn't a magic total solution to the problem, but is in fact a particular point on the engineering tradeoff spectrum. In particular, the "exactly once" solutions can be the wrong choice for certain problems, like multiplayer game state updates, where it may be a lot more viable to think 1-or-0 and some timestamping and the ability to miss messages entirely and recover, rather than building an "exactly once" system.
naasking|3 years ago
I think the difference might be partly semantic. If processing at the messaging level is idempotent + at least once, then message delivery to the application level is exactly once. People mostly only care about the application level not the lower levels where they might just build on a library or system that handles that logic for them.
doctor_eval|3 years ago
The problem with this is similar to the problems with two-phase commit in distributed databases: there are unavoidable failure cases. Most of the time it works just fine, but if you write your application to depend on this impossible feature, and it fails - which, given enough time, will certainly happen - then the cleaning up the mess can be much more effort (and have much wider business implications) than simply dealing with the undesirable behaviour of reality in the first place.
Or to put it another way: exactly once semantics can never be reliably extracted away from the application, so if you need it, it needs to be part of your application.
tunesmith|3 years ago
tunesmith|3 years ago
FooBarWidget|3 years ago
lll-o-lll|3 years ago
Hit an error, roll-back, side-affect can’t be rolled back. Retry - side-affect happens again.
Wouldn’t the general approach be to have unique message identifiers and queue side-affects? Maybe I’m missing lots of subtleties.
purpleblue|3 years ago
If you guarantee "exactly once", you design your systems differently than "at least one with idempotence". A system designed for exactly once will be less complicated than a system designed for at least once + idempotence, which is why it is ideal but impossible.
stonemetal12|3 years ago
paxys|3 years ago
fizwhiz|3 years ago
naasking|3 years ago
sokoloff|3 years ago
echelon|3 years ago
In active / active setups, there are other strategies such as partitioning and consensus.
dilyevsky|3 years ago
hackerdad|3 years ago