(no title)
mpercy
|
7 years ago
FWIW, we implemented dynamic consensus membership change in Kudu way back in 2015 (https://github.com/apache/kudu/commit/535dae) but presumably that was after the fork. We still haven't implemented leader leases or distributed transactions in Kudu though due to prioritizing other features. It's very cool that you have implemented those consistency features.
kmuthukk|7 years ago
Thanks for correcting me on the dynamic consensus membership change. Looks like the basic support was indeed there, but several important enhancements were needed (for correctness and usability).
- To make the "online" piece of the membership change work correctly we added support for LEARNER (PRE VOTER) role (where the new member enters in a non-voting mode till it's caught up). https://github.com/YugaByte/yugabyte-db/commit/909d26e31ecd0....
- Load Balancing (which uses the membership changes) is automatic. (https://github.com/YugaByte/yugabyte-db/commit/e4667eb7ec0e6...)
- Remote bootstrap (due to membership changes) also has undergone substantial changes given that YugaByte DB uses a customize/extended version of RocksDB as the storage engine and does a tighter coupling of Raft with RocksDB storage engine. (https://github.com/YugaByte/yugabyte-db/blob/master/docs/ext...)
- Dynamic Leader Balancing is also new-- it causes leadership to be proactively altered in a running system to ensure each node is the leader for a similar number of tablets.
regards, Kannan
mpercy|7 years ago
I'm curious if you did anything to prevent automatic rebalancing from being triggered at a "bad time" or have throttled it in some way, or whether moving large amounts of data between servers at arbitrary times was not a concern.
I am also curious if you added some type of API using the LEARNER role to support a CDC-type of listener interface using consensus.
By the way, we also recently added support for rack/location awareness in a series of patches including https://github.com/apache/kudu/commit/ebb285
We should really start some threads on the dev lists to periodically share this type of information and merge things back and forth to avoid duplicating work where possible. I know the systems are pretty different at the catalog and storage layers but there are still many similarities.