Distributed Locks Are Dead; Long Live Distributed Locks

[+] mjb|6 years ago|reply

This is good work, and the lock semantics and fencing token (epoch) make a lot of sense. I can't help but think that implementing java.util.concurrent.locks.Lock will turn out to be a liability. The problem here is that the code looks like a Java lock, but the semantics are entirely different with regards to failure. Specifically:

> While we’re on this subject, the same logic applies even to the primary FencedLock.lock() call: at the very next line of code in your program, you may no longer be holding the lock.

That's not, in most programmer's experience, how locks work. This behavior is necessary (at some level) to deal with partial failures and stalls of clients, but means that if you use this like Lock your code will be very wrong.

> Note the key message here: all external services must participate in the fencing-token protocol, with guaranteed linearizability, for the whole setup to uphold its invariants.

So this isn't really like a Java lock at all, and instead is a nice convenient way to build part of an epoch/view change implementation. That's useful, but in my mind the API they chose will reduce the likelihood that non-experts will use this correctly.

[+] roro159|6 years ago|reply

I think it's worth to add a few previous discussions about distributed locks.

Redlock is a distributed lock using Redis: https://redis.io/topics/distlock

Martin Kleppmann criticized Redlock and mentioned the fencing solution: http://martin.kleppmann.com/2016/02/08/how-to-do-distributed...

Antirez disagrees with the analysis and the HN post has a good discussion: https://news.ycombinator.com/item?id=11065933

[+] sriram_malhar|6 years ago|reply

More accurately, these are leases, not locks in the traditional sense. The lease expires when the corresponding client session ends, which is detected by the absence of a heartbeat from the client.

[+] hcnews|6 years ago|reply

Locks are always leases in a distributed system, at least in my experience (and for a sane system).

[+] hinkley|6 years ago|reply

Plus ça change, plus c'est le même chose.

I think the first time I encountered the idea of leases it was dabbling in some CORBA code, back when enough people knew what that acronym means that Sun Microsystems thought they should include an orb in the dev kit. And then again in their RPC mechanism for Java.

Since then I've encountered the concept from others only a handful of times. Object leases aren't that important if your distributed state is straightforward. Platonic REST with no state doesn't need them, nor really do the fully stateful servers we had for about 20 years. And then there are concensus protocols like Raft which fill in some of the gaps in between.

[+] hinkley|6 years ago|reply

Having read the article, I'm not sure I agree.

First, leases aren't mentioned at all. Second, committing changes looks a lot like optimistic locking, due to the version number they're assigning to the lock.

[+] grogers|6 years ago|reply

It's not a lease, anyone can force close your session (which unlocks your locks), which takes effect immediately, not after waiting out a lease holding grace period.

[+] cangencer|6 years ago|reply

There was also a follow up blog post again by Basri regarding how it's tested using Jepsen for those interested: https://hazelcast.com/blog/testing-the-cp-subsystem-with-jep...

[+] heavenlyblue|6 years ago|reply

What happens when some of the quorum members are not available?

The fact that split brain is not allowed implies that liveness is given up for it.

More importantly, what can I possibly do in the scenario where I would like to obtain several locks at the same time?

Distributed lock frameworks usually imply there’s some sort of transaction reversal mechanism implied by the architecture.

[+] hinkley|6 years ago|reply

I see they mention Raft in the article. If they are updating a directory of ownership data via Raft then your split brain problem is addressed.

Generally, distributed locks are a fiat based system. I claim ownership of something and I have indisputable rights to that thing until lease renewal time. If the lease renewal fails for any reason I have to give up my claim on the object.

I might have an architecture that lets me make forward progress in a split brain scenario because I owned a lease before the split happened. If the recovery is fast enough then everything will be fine.

However my instincts tell me that it would take a pretty special problem domain and a very assertive dev team to maintain this invariant over a long period of time. Business people see all this data we have and they want to connect it more and more over time. They are not above selling a feature and then cajoling us into implementing it, even if it reduces long-term viability.

In the end you are left with is a distributed system with lower overhead per transaction. But that's nothing to sneeze at.

[+] mey|6 years ago|reply

Each lock has an associated name. If you have locks for separate purposes, you have separate names. If you need to grab multiple locks before processing a single task, I would suggest looking for a design that doesn't need it. If there is no other way, then acquring each like in a defined order would still work.

[+] PaulHoule|6 years ago|reply

I love Hazelcast. It's the best thing since the Coupling Facility on zSystem mainframes.

[+] PaulHoule|6 years ago|reply

I don't understand why people think comparing something to the 360 is ever an insult.

14 comments