top | item 37674967

ClickHouse Keeper: A ZooKeeper alternative written in C++

218 points| eatonphil | 2 years ago |clickhouse.com | reply

126 comments

order
[+] tbragin|2 years ago|reply
Coincidentally, as someone who worked on this blog, I was surprised (and pleased!) to see that we are not the only ones who felt the need to build a Zookeeper alternative.

Looks like folks at StreamNative did as well, with their Oxia project: https://github.com/streamnative/oxia. They were just talking about this yesterday at Confluent Current ("Introducing Oxia: A Scalable Zookeeper Alternative" was the title of their talk). https://streamnative.io/blog/introducing-oxia-scalable-metad...

Seems to be a trend :)

[+] coding123|2 years ago|reply
Is the trend mainly due to ZK being written in Java?
[+] jzelinskie|2 years ago|reply
It's been a few years since I've checked in with distributed lock services. Why would someone adopt ZooKeeper after etcd gained maturity? I recall seeing benchmarks more than 5 years ago where a naive proxy like zetcd[0] out-performs ZooKeeper itself in many ways and offers more consistent latencies. etcd has gotten lots of battle-testing being Kubernetes' datastore, but I can also see how that has shaped its design in a way that might not fit other projects.

I think there are plenty of other projects (e.g. FoundationDB, Kafka) that also replaced their usage of ZooKeeper as their systems matured. I guess I'm confused why anyone has been picking up new installations of ZooKeeper.

[0]: https://github.com/etcd-io/zetcd

[+] zX41ZdbW|2 years ago|reply
There is no specific reason to start with ZooKeeper, nor with ClickHouse Keeper, if you want to use another distributed consensus system.

But: every such system is slightly different in the data model and the set of available primitives.

It's very hard to build a distributed system correctly, even relying on ZooKeeper/Etcd/FoundationDB. For example, when I hear "distributed lock," I know that there is 90% chance there is a bug (distributed lock can be safely used if every transaction made under a lock also atomically tests that the lock still holds).

So, if there is an existing system heavily relying on one distributed consensus implementation, it's very hard to switch to another. The main value of ClickHouse Keeper is its compatibility with ZooKeeper - it uses the same data model and wire protocol.

[+] klysm|2 years ago|reply
The term “distributed lock” is a bit of a mental red flag to me.
[+] antonio2368|2 years ago|reply
As one of the contributors, I'm always happy to see interest and people using it.

Keeper is a really interesting challenge and we're really open to any kind of feedback and thoughts.

If you tried it out and have some feedback for it, I encourage you to create an issue (https://github.com/ClickHouse/ClickHouse), ask on our Slack, ping me directly on Slack... (just don't call me on my phone)

And don't forget that it's completely open-source like ClickHouse so contributors are more than welcome.

[+] Redsquare|2 years ago|reply
shared mergetree is not open!
[+] xyzelement|2 years ago|reply
I usually scoff at "written in.." part of such announcements, because it is a sign that the author is focused on the input ("I wrote this in X") not the output (value the user gets)

In this case though, the blog outlines specific reasons why this had to be in C++ (interoperability w. their C++ codebase) as well as benefits that are separate from the language.

[+] lsofzz|2 years ago|reply
> author is focused on the input ("I wrote this in X") not the output (value the user gets)

Possibly, a lot of us enjoy developing X in Y for the sake of doing so. Not everyone may end up caring about the value that the user gets.

[+] The_Colonel|2 years ago|reply
It's a huge turn-off for me as well, because I interpret it as the main value it's supposed to deliver (which for me is 0 for the most part). Not talking about this specific project, just generally.
[+] davideberdin|2 years ago|reply
I'm always impressed by the quality of the blog posts coming out from clickhouse.com! Super well written!
[+] pram|2 years ago|reply
Looks nice, I will definitely be trying this out.

Built in s3 storage immediately sold me. I’ve used something called Exhibitor to manage ZK clusters in the past but it’s totally dead. Working with ZK is probably one of my least favorite things to do.

[+] randomtb|2 years ago|reply
I used exhibitor in the past too. They were specially useful during the time when zookeeper cluster needed to expand/shrink or move host nodes. Zookeeper dynamic configuration solved that problem, which seems to be also supported by clickhouse keeper, Pretty impressive! Would definitely give it a try.
[+] tylerhannan|2 years ago|reply
Thanks for sharing!

If anyone has any questions, I'll do my best to get them answered.

(Disclaimer: I work at ClickHouse)

[+] abronan|2 years ago|reply
Thanks for this excellent article! Enjoyed it from start to finish. This gave me a good memory of the work we've done at docker embedding our own replicated and consistent metadata storage using etcd's raft library.

Looking at the initial pull request, is it correct that ClickHouse Keeper is based on Ebay's NuRaft library? Or did the Clickhouse team fork and modified this library to accommodate for ClickHouse usage and performance needs?

[+] pdeva1|2 years ago|reply
1. can this be used without clickhouse as just a zookeeper replacement? 2. am i correct in that its using s3 as disk? so can it be run as stateless pods in k8s? 3. if it uses s3, how are latency and costs of PUTs affected? does every write result in a PUT call to s3?
[+] alchemist1e9|2 years ago|reply
Is there a python client library you can recommend?
[+] secondcoming|2 years ago|reply
What do you use for network stuff in C++, ASIO?
[+] LispSporks22|2 years ago|reply
Man i would love to work at one of these companies that let their engineers go off and implement alternatives to widely used and tested open source projects like Zookeeper merely for speculative performance gains and “… C++”

edit: I'm not even saying this facetiously, it would be freaking awesome.

[+] mr-karan|2 years ago|reply
I've been using Clickhouse Keeper in production environment since it's first release and have been really happy. It makes setting up distributed tables in Clickhouse (replicas/shards) quite easy. The only issues I've seen is when inserting at a super high throughput (>1000/sec writes) which is actually more of a n issue with Clickhouse Merge Table tree settings rather than Keeper itself.

I've also written about it here: https://mrkaran.dev/posts/clickhouse-replication/

[+] klysm|2 years ago|reply
I hate running services in Java but this will have to earn a lot of trust before it’s a viable replacement in prod
[+] tylerhannan|2 years ago|reply
++ agreed.

ClickHouse Keeper was released as feature complete in December of 2021.

It runs thousands of clusters, daily, both in CSP hosted offerings (including our own ClickHouse Cloud) and at customers running the OSS release.

Never accept any claims at face value and always test. But, in this case, it is quite battle-hardened (i.e. the Jepsen tests run 3x daily https://github.com/ClickHouse/ClickHouse/tree/master/tests/j...).

[+] snotrockets|2 years ago|reply
Especially as it doesn't have memory safety, which is table stakes in 2023.
[+] Dowwie|2 years ago|reply
Any thoughts here on Fly's Corrosion? https://github.com/superfly/corrosion
[+] mdaniel|2 years ago|reply
At least two comments spring to mind: this is at least _blogged_ as a drop-in ZK replacement, which for sure is not true of Corrosion, and ClickHouse has Jepsen tests for their distributed KV store, which I don't see any reference to such a thing for Corrosion

Maybe neither of those two things matter for one's use case, but it's similar to someone rolling up on this blog post and saying "but what about etcd" -- they're just different, with wholly different operational and consumer concerns

[+] miljen|2 years ago|reply
Do you provide a decent C++ client library? ZooKeeper only provides a C library that has certain... disadvantages.
[+] dathinab|2 years ago|reply
Is it just me or does it look like an "alternative for use-cases ZooKeeper was not intended for"?

E.g. if we quote ZooKeeper:

> ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

and ClickHouse

> ClickHouse is the fastest and most resource-efficient open-source database for real-time applications and analytics.

Like this are completely different use cases with just a small overlap.

[+] advisedwang|2 years ago|reply
And then further down:

> ClickHouse Keeper is a drop-in replacement for ZooKeeper

That opening was about ClickHouse in general, but the article is about one particular application using the database.

[+] cvccvroomvroom|2 years ago|reply
Compatible possibly but unproven in production at scale like ZK.
[+] tylerhannan|2 years ago|reply
Definitely used in production ;) and at rather some scale.

It runs thousands of clusters, daily, both in CSP hosted offerings (including our own ClickHouse Cloud) and at customers running the OSS release.

Never accept any claims at face value and always test. But, in this case, it is quite battle-hardened (i.e. the Jepsen tests run 3x daily https://github.com/ClickHouse/ClickHouse/tree/master/tests/j...)

But yes, ZooKeeper is pretty amazing. We are building on the backs of giants.

I'd also argue the RAFT v. ZAB is an important production scale conversation. But, as the blog says, Zookeper is a better option when you require scalability with a read-heavy workload.

[+] spullara|2 years ago|reply
Yet another thing I have just used FoundationDB for in the past.
[+] insanitybit|2 years ago|reply
So could I just point my Kafka at this thing and use it?
[+] qoega|2 years ago|reply
You can even migrate your zookeeper to ClickHouse keeper. It requires small downtime, but you will have all your zookeeper data inside and your clients will just work when your keeper will be back
[+] wdb|2 years ago|reply
Nice to see an alternative for Zookeeper that doesn't depend on the Java runtime

I thought stuff were supposed to be rewritten Rust /s

[+] kiratp|2 years ago|reply
Written in C++ is not a positive in my book. New things created this decade in unsafe languages (where safe options would have worked fine) should be frowned up and criticized as bad engineering.
[+] krvajal|2 years ago|reply
> where safe options would have worked fine

Thats a very short-sighted view IMO, software engineering is not just about technology choices.