Clickhouse has some weird quirks when you think of it as a SQL Database, but its astounding to use it. It's faster than one would think, it can do some really cool data modeling, and provides a wealth of features for the average user out of the box.
The most important thing, and the thing that makes it attractive to me is that it is almost stupidly simple to setup and get running. It's quite simple (when you wrap your head around it) to do sharding or replication and scale up. The zookeeper stuff takes a bit more effort, but most of that is due to zookeeper and not ClickHouse.
Second the “stupidly simple to setup and get running”. My company works with billion row datasets on client sites where we get super locked down accounts. Clickhouse is a single binary that you can run with no actual “install” needed.
Also echo the comments in the rest of the discussion about it being blazing fast. On our beefier machines we get querying in the 100s of millions of rows per second when the data is not in cache.
I remember playing with it a while ago and here are my 2c:
- It's ridiculously fast, like, I didn't know where the performance was coming from.
- Getting it up and running was a bit clunky (Docker saved me), but I hear it's better now.
- It has non-standard (I mean, everyone is non-standard, but this is way off ANSI) and case-sensitive (???) SQL syntax. This annoyed me again and again.
- It seemed (and still seems) like a project that lives and dies with one developer - and no matter how brilliant he may be, I'm not willing to invest in a technology that has this risk and it's so hard to migrate off of (because of the non-standard SQL).
I'm sad about the last two points, because the database is rather brilliant otherwise.
If you need Advanced SQL Support ClickHouse is not there (Yet) but if you need high performance for relatively basic queries ClickHouse is great.
It is developed mostly by ClickHouse staff but there is at least one company https://www.altinity.com/ which offers Commercial Support, Consulting, Trainin for ClickHouse
By one developer you mean Yandex or that most commits are made by a couple of users? Being backed by a large company (the Russian Google apparently) that has an independent revenue stream seems like a large plus, but maybe not enough to cancel out.
I'm wary of investing effort into a potentially unsupported project as well, but I wonder if ClickHouse only seems "out there" because we're not aware of the Russian tech ecosystem (at least I'm not).
People don't seem concerned about building anything with Firebase, but Google doesn't have a good track record of changing its mind about priorities or service pricing.
What would you recommend instead for a column-oriented db that you can self-host (commercial or open source)?
The basic techniques for implementing a fast column-store data warehouse have been well-known for 10 years. There are several excellent commercial and open-source implementations of these techniques:
- BigQuery
- Snowflake
- Redshift
- Presto
ClickHouse is not one of them. It doesn't have:
- Transactions
- Distributed joins
- Separate compute from storage
- UPDATE
- User management
I don't mean to be a jerk, I'm just trying to save people some time. Columnar DBs is well-trod territory and ClickHouse is way behind.
I wouldn't call Redshift "excellent", nor would I call ClickHouse "way behind". ClickHouse was the best choice for my last employer's use case (https://twitter.com/zeeg/status/987009550501928960), after many other solutions were tested and benchmarked.
Just because a tool doesn't have a specific feature checklist doesn't mean you should categorically rule it out, particularly if you don't have experience using/running/deploying it.
Redshift doesn't separate compute from storage either unless you're using Spectrum. Presto isn't a database at all and can read from many data stores. The rest are all cloud-hosted with lots of moving parts. MemSQL, Vertica, Actian, Greenplum, and SQL Server are better comparisons.
ClickHouse is a column-oriented db and actually one of the most advanced, focusing on performance at all costs with lots of table storage engines that provide flexibility for your exact use-case. It also supports distributed joins and deletes but has some limitations they are working on.
It can definitely use better tooling and compatibility though, but that's the tradeoff the core team made, and it seems to be working well for the companies that can afford the time and talent.
There is quite a difference between theoretical technologies and stable high-performance implementation. Majority of things ClickHouse does are very well known it just does them
ClickHouse Indeed does not do "Separate Compute from Storage" yet it is architectural decision not a feature gap. Running ClickHouse with directly attached storage and built in replication can be super fast and cost efficient. It works best for stable workloads
You can run it on a cluster or a single server. It's pretty easy to setup either way.
No updates. Fast inserts.
You can only join two tables at a time, but the joins can be chained to deal with this limitation.
I tried Monet. It wasn't very stable for me. I didn't stick with it long enough to judge it. ClickHouse has backing of Yandex. I think that makes a huge difference.
I have used Clickhouse for the past year. Thrown 3000 column by 120 million row tables on it. It worked where PostgreSQL came to a halt. Different use cases really.
I fits my use case perfectly. Large amounts of data with no updates and tons of aggregations. It's lighting fast.
monetDB is a sort of drop-in replacement for a regular database with all expected features and good compatibility.
On other hand ClickHouse will be incompatible with most of existing tools and it's better to learn well its limitations and workaround technics in advance. But once you dump into it substantial amounts of time series data you'll find it 10+ times faster and 2-3 times smaller than monet.
[+] [-] lykr0n|7 years ago|reply
The most important thing, and the thing that makes it attractive to me is that it is almost stupidly simple to setup and get running. It's quite simple (when you wrap your head around it) to do sharding or replication and scale up. The zookeeper stuff takes a bit more effort, but most of that is due to zookeeper and not ClickHouse.
[+] [-] gary__|7 years ago|reply
https://www.slideshare.net/Altinity/migration-to-clickhouse-...
Year on now, perhaps things have changed.
[+] [-] tadkar|7 years ago|reply
Also echo the comments in the rest of the discussion about it being blazing fast. On our beefier machines we get querying in the 100s of millions of rows per second when the data is not in cache.
[+] [-] drej|7 years ago|reply
- It's ridiculously fast, like, I didn't know where the performance was coming from.
- Getting it up and running was a bit clunky (Docker saved me), but I hear it's better now.
- It has non-standard (I mean, everyone is non-standard, but this is way off ANSI) and case-sensitive (???) SQL syntax. This annoyed me again and again.
- It seemed (and still seems) like a project that lives and dies with one developer - and no matter how brilliant he may be, I'm not willing to invest in a technology that has this risk and it's so hard to migrate off of (because of the non-standard SQL).
I'm sad about the last two points, because the database is rather brilliant otherwise.
[+] [-] PeterZaitsev|7 years ago|reply
It is developed mostly by ClickHouse staff but there is at least one company https://www.altinity.com/ which offers Commercial Support, Consulting, Trainin for ClickHouse
[+] [-] dschuler|7 years ago|reply
I'm wary of investing effort into a potentially unsupported project as well, but I wonder if ClickHouse only seems "out there" because we're not aware of the Russian tech ecosystem (at least I'm not).
People don't seem concerned about building anything with Firebase, but Google doesn't have a good track record of changing its mind about priorities or service pricing.
What would you recommend instead for a column-oriented db that you can self-host (commercial or open source)?
[+] [-] bsg75|7 years ago|reply
One major contributor (who may be project lead at Yandex?) and a lot of active contributors: https://github.com/yandex/ClickHouse/graphs/contributors
[+] [-] Grue3|7 years ago|reply
This project is developed by Yandex and they have a team working on it.
[+] [-] dang|7 years ago|reply
[+] [-] georgewfraser|7 years ago|reply
[+] [-] ehfeng|7 years ago|reply
Just because a tool doesn't have a specific feature checklist doesn't mean you should categorically rule it out, particularly if you don't have experience using/running/deploying it.
[+] [-] manigandham|7 years ago|reply
ClickHouse is a column-oriented db and actually one of the most advanced, focusing on performance at all costs with lots of table storage engines that provide flexibility for your exact use-case. It also supports distributed joins and deletes but has some limitations they are working on.
It can definitely use better tooling and compatibility though, but that's the tradeoff the core team made, and it seems to be working well for the companies that can afford the time and talent.
[+] [-] PeterZaitsev|7 years ago|reply
Here is example Performance comparison we did at Percona https://www.percona.com/blog/2017/02/13/clickhouse-new-opens...
[+] [-] bretthoerner|7 years ago|reply
[+] [-] theshadowknows|7 years ago|reply
[+] [-] dikei|7 years ago|reply
[+] [-] mamcx|7 years ago|reply
Pointers to what them are?
[+] [-] PeterZaitsev|7 years ago|reply
[+] [-] tuananh|7 years ago|reply
https://blog.cloudflare.com/http-analytics-for-6m-requests-p...
[+] [-] dorfsmay|7 years ago|reply
Can you update specific rows? How fast are updates?
How does it compare to monetDB.
[+] [-] sin7|7 years ago|reply
No updates. Fast inserts.
You can only join two tables at a time, but the joins can be chained to deal with this limitation.
I tried Monet. It wasn't very stable for me. I didn't stick with it long enough to judge it. ClickHouse has backing of Yandex. I think that makes a huge difference.
I have used Clickhouse for the past year. Thrown 3000 column by 120 million row tables on it. It worked where PostgreSQL came to a halt. Different use cases really.
I fits my use case perfectly. Large amounts of data with no updates and tons of aggregations. It's lighting fast.
[+] [-] eldargab|7 years ago|reply
On other hand ClickHouse will be incompatible with most of existing tools and it's better to learn well its limitations and workaround technics in advance. But once you dump into it substantial amounts of time series data you'll find it 10+ times faster and 2-3 times smaller than monet.