top | item 41400871

(no title)

mprime1 | 1 year ago

Leader election and distributed locking reduce to the same problem… which is proven to be impossible. It means in some edge case it will fail on you, is your system handling those cases?

I didn’t read past this:

> Systems like Apache ZooKeeper or Postgres (via Advisory Locks) provide the required building blocks for this

Zookeeper is the original sin. Convincing a whole generation of programmers that distributed lock are a feasible solution.

This is my biggest pet peeve in distributed systems.

——

And if you don’t believe me, maybe you’ll trust Kyle K. of Jepsen fame:

> However, perfect failure detectors are impossible in asynchronous networks.

Links to: https://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p225-...

https://jepsen.io/analyses/datomic-pro-1.0.7075

discuss

p1necone|1 year ago

> which is proven to be impossible.

'Technically' intractable problems are solvable just fine in a way that is almost as useful as solving them completely if you can achieve one of two things:

* Reliably identify when you've encountered an unsolvable case (usefulness of this approach depends on the exact problem you're solving).

* Reduce the probability of unsolvable cases/incorrect solutions to a level low enough to not actually happen in practice.

'Technically' GUIDs are impossible, reliable network communication (TCP) is impossible, O^2 time complexity functions will grow to unusably large running times - but in practice all of these things are used constantly to solve real problems.

mprime1|1 year ago

"Distributed locks" are at best a contention-reduction mechanism. They cannot be used to implement mutual exclusion that is _guaranteed_ to work.

I've seen way too many systems where people assume TCP == perfectly reliable and distributed locks == mutual exclusion. Which of course it's not the case.

everforward|1 year ago

> Convincing a whole generation of programmers that distributed lock are a feasible solution.

I too hate this. Not just because the edge cases exist, but also because of the related property: it makes the system very hard to reason about.

Questions that should be simple become complicated. What happens when the distributed locking system is down? What happens when we reboot all the nodes at once? What if they don't come down at exactly the same time and there's leader churn for like 2 minutes? Etc, etc.

Those questions should be fairly simple, but become something where a senior dev is having to trace codepaths and draw on a whiteboard to figure it out. It's not even enough to understand how a single node works in-depth, they have to figure out how this node works but also how this node's state might impact another node's.

All of this is much simpler in leaderless systems (where the leader system is replaced with idempotency or a scheduler or something else).

I very strongly prefer avoiding leader systems; it's a method of last resort when literally nothing else will work. I would much rather scale a SQL database to support the queries for idempotency than deal with a leader system.

I've never seen an idempotent system switch to a leader system, but I've sure seen the reverse a few times.

anothername12|1 year ago

>> Convincing a whole generation of programmers that distributed lock are a feasible solution.

> I too hate this. Not just because the edge cases exist, but also because of the related property: it makes the system very hard to reason about.

I think this is a huge problem with the way we’re developing software now. Distributed systems are extremely difficult for a lot of reasons, yet it’s often or first choice when developing even small systems!

At $COMPANY we have hundreds of lambdas, DocumentDB (btw, that is hell in case you’re considering it) and other cloud storage and queuing components. On call and bugs basically are quests in finding some corner case race condition/timing problem, read after write assumption etc.

I’m ashamed to say, we have reads wrapped in retry loops everywhere.

The whole thing could have been a Rails app with a fraction of the team size and a massive increase in reliability and easier to reason about/better time delivering features.

You could say we’re doing it wrong, and you’d probably be partly right for sure, but I’ve done consulting for a decade at dozens of other places and it always seems like this.

silasdavis|1 year ago

This is rather misleading, the FLP theorem talks about fully asynchronous networks with unbounded delay. Partial synchrony is a perfectly reasonable assumption and allows atomic broadcast and locking to work perfectly well even if there is an unknown but finite bound on network delay.

mprime1|1 year ago

Notice I did not mention FLP.

Atomic Broadcast (via Paxos or RAFT) does not depend on partial synchrony assumptions to maintain its safety properties.

Your internet or intranet networks are definitely asynchronous and assuming delays are bound is a recipe for building crappy systems that will inevitably fail on you in hard to debug ways.

gunnarmorling|1 year ago

Had you read on, you'd have seen that I am discussing this very point:

> leader election will only ever be eventually correct... So you’ll always need to be prepared to detect and fence off work done by a previous leader.

mprime1|1 year ago

I'm sorry, I didn't mean to be bashful. I am not familiar with S3 and maybe what you describe is a perfectly safe solution for S3 and certain classes of usage.

I could not get past the point where you promulgate the idea that ZK can be used to implement locks.

Traditionally a 'lock' guarantees mutual exclusion between threads or processes.

"Distributed locks" are not locks at all. They look the same from API perspective, but they have much weaker properties. They cannot be used to guarantee mutual exclusion.

I think any mention of distributed locks / leader election should come with a giant warning: THESE LOCKS ARE NOT AS STRONG AS THE ONES YOU ARE USED TO. Skipping this warning is doing a disservice to your readers.

realaleris149|1 year ago

Reliable network communication is also proven to be impossible [1], yet it happens all the time. Yes, sometime it fails but it still “works”.

[1] https://en.wikipedia.org/wiki/Two_Generals%27_Problem

mprime1|1 year ago

There's some serious flaws in your reasoning.

TCP guarantees order [as long as the connection is active] but it is far from being 'perfectly reliable'.

Example: sender sends, connection drops, sender has no idea whether the receiver received.

In other words, it works until it doesn't. The fact that sometimes it doesn't means it's not perfect.

TCP is a great tool but it doesn't violate the laws of physics. The 2 generals problems is and will always be impossible.

akira2501|1 year ago

> sometime it fails

That failures are a possibility does not concern me. That the failures are not fully characterized does.

EGreg|1 year ago

I literally had this argument with David Schwartz of Ripple about our lack of a consensus protocol.

woooooo|1 year ago

They both reduce to a paxos style atomic broadcast, which is in fact possible although the legend is that Leslie Lamport was trying to prove it impossible and accidentally found a way.

mprime1|1 year ago

> They both reduce to a paxos style atomic broadcast

Atomic Broadcast guarantees order of delivery. It does not (cannot) guarantee timing of delivery. Which is what people want and expect when using distributed lock / leader election.

setheron|1 year ago

I remember at Oracle they built systems to shut down the previous presumed leader to definitively know it wasn't ghosting.

tanelpoder|1 year ago

Yep, the "STONITH" technique [1]. But programmatically resetting one node over a network/RPC call might not work, if internode-network comms are down for that node, but it can still access shared storage via other networks... The Oracle's HA fencing doc mentions other methods too, like IPMI LAN fencing and SCSI persistent reservations [2].

[1] https://en.wikipedia.org/wiki/STONITH

[2] https://docs.oracle.com/en/operating-systems/oracle-linux/8/...

butterisgood|1 year ago

Indeed a slow node looks like a dead node for a while until it isn’t.

At some point distributed systems that work well vs others that do not is an “art of tuning timeouts and retries”.

Also nothing in production is perfect - so we should consider failures always when writing code in distributed systems and the impacts.

And we will still make mistakes…

orf|1 year ago

> …which is proven to be impossible

For some definition of impossible, given that many systems utilise them effectively. Not all corner cases or theoretical failure modes are relevant to everyone.

lijok|1 year ago

Many systems are buggy, periodically running into corner cases that are borderline impossible to debug.

someguy4242|1 year ago

Can you point me in the right direction to understand Zookeepers fatal flaw?

I know you linked the first paper, but yeah candidly I don’t wish to read the full paper

Don’t mean this sarcastically whatsoever. Genuine interest.

mprime1|1 year ago

The fatal flaw is calling them "locks".

When programmers think of locks, they think of something that can be used to guarantee mutual exclusion.

Distributed locks have edge cases where mutual exclusion is violated.

Implementation does not matter.

e.g. imagine someone shows you a design for a perpetual motion machine. You don't need to know the details to know it doesn't work! It would violate the laws of physics!

Similarly, anyone telling you they created an implementation of a distributed lock that is safe, is claiming their system breaks the laws of information theory.

"Distributed locks" are at best contention-reduction mechanisms. i.e. they can keep multiple processes from piling up and slowing each other down.

[Some] Paxos for example use leader election to streamline the protocol and achieve high throughput. But Paxos safety does NOT depend on it. If there are multiple leaders active (which will inevitably happen), the protocol still guarantees its safety properties.

withinboredom|1 year ago

For me, I almost stopped reading at the assertion that clock drift doesn't matter. They clearly didn't think through the constant fight that would occur over who the leader actually was and just hand-wave it away as 'not an issue.' They need to remove time from their equation completely if they want clock drift to not matter.

pluto_modadic|1 year ago

part of me thinks that clock drift would be reliably biased toward a particular node, not leapfrogging between two nodes.