top | item 9903370

(no title)

winsletts | 10 years ago

The code relies on functionality in etcd to prevent a race condition. Using `prevExist=false` on acquiring the leader key, the set will fail if another node wins the race.

The functionality in the code is here: https://github.com/compose/governor/blob/master/helpers/etcd...

The documentation for etcd is here: https://coreos.com/etcd/docs/latest/api.html#atomic-compare-...

discuss

order

Someone|10 years ago

But then, isn't it not

"If no one has the leader key it runs health checks and takes over as leader."

but

"If no one has the leader key it takes over as leader, runs health checks, and starts functioning as leader."

? If so, I would do the health checks and then try to become the leader. Or do the 'health checks' involve other nodes?

merb|10 years ago

It simply relies on the Voting feature of ETCD (Raft) it's really simple to use locking with etcd, and etcd is really really stable. However it would be easier to install etcd on every Postgres node and just make a golang library that sets the master of Postgres to the etcd master (etcd also has a leader). Also systemd would keep the overall system healthy. (that's what we at envisia do) Just have repeatedly check if the machine is the leader and if yes it sets the url of the currently running machine to a etcd key. So overall we need to use 3 Postgres machines and 1 could fail and we would still have voting, however thats just for a single master where we don't need to read from the slaves, however thats easily extendable.

Oh and here is the Compare and Swap (Atomic) functionality of etcd that he described: https://github.com/coreos/etcd/blob/master/Documentation/api...