ClickHouse – high-performance open-source distributed column-oriented DBMS

[+] buremba|9 years ago|reply

This is huge. It seems to me that it's similar to BigQuery but has many other features that I didn't see in other databases.

AggregatingMergeTree is especially one of them and allows incremental aggregation which is a huge gain for analytics services.

Also it provides many table engines for different use-cases. You don't even need a commit-log such as Apache Kafka in front of ClickHouse, you can just push the data to TinyLog table and and move data in micro-batches to a more efficient column-oriented table that uses different table engine.

[+] gopalv|9 years ago|reply

This looked interesting at first sight (considering I work on ORC & Hive) - the single machine configuration with TinyLog and the aggregate indexes look good[1].

But the more I look into the distributed storage section, the more corners I see cut in the data consistency section.

"There are no quorum writes. You can't write data with confirmation that it was received by more than one replica. If you write a batch of data to one replica and the server with this data ceases to exist before the data has time to get to the other replicas, this data will be lost."

"The reason for this is in case of network failures when the client application doesn't know if the data was written to the DB, so the INSERT query can simply be repeated. It doesn't matter which replica INSERTs were sent to with identical data - INSERTs are idempotent. This only works for the last 100 blocks inserted in a table."

"Failover is automatic (for small differences in data) or semi-automatic (when data differs too much, which may indicate a configuration error)."

"When the server starts (or establishes a new session with ZooKeeper), it only checks the quantity and sizes of all files. If the file sizes match but bytes have been changed somewhere in the middle, this is not detected immediately, but only when attempting to read the data for a SELECT query."

These sort of network blips and bad disk problems happen every week at least once on a big cluster and something like hadoop wastes a lot of IO doing bit-rot checks on cold data all the time.

The nature of clickstream data makes it somewhat okay to lose a few chunks in transit - I can imagine at least a few of the beacons will get dropped purely over the HTTP mechanism which pumps data into the system.

At some point, the data consistency costs money, slows down inserts and creates all sorts of limitations on how recovery of data would play out.

But as a general purpose replicated DBMS which serves as a system of record against fraud allegations (for instance), I can't see this comparing well.

[1] - 200 million rows/sec for join+aggregation is ~2x as good as Hive 2.0 single node LLAP on a hot run (https://people.apache.org/~gopalv/LLAP.gif)

[+] grogenaut|9 years ago|reply

For those who are wondering: Doesn't build under osx, tho the issues initially don't see insurmountable, more that it hasn't been tried under CLANG.

[ 3%] Building CXX object contrib/libpoco/Foundation/CMakeFiles/PocoFoundation.dir/src/AbstractObserver.cpp.o error: unknown warning option '-Wno-unused-local-typedef' [-Werror,-Wunknown-warning-option] error: unknown warning option '-Wno-for-loop-analysis'; did you mean '-Wno-loop-analysis'? [-Werror,-Wunknown-warning-option] make[2]: * [contrib/libpoco/Foundation/CMakeFiles/PocoFoundation.dir/src/AbstractObserver.cpp.o] Error 1 make[1]: * [contrib/libpoco/Foundation/CMakeFiles/PocoFoundation.dir/all] Error 2

A bit of googling shows this is likely because Clang -Wall overrides any other flags set earlier which is apparently different than GCC. This makes me think they aren't lying when they say it supports linux in that they likely haven't tried building it on mac much.

That said it doesn't see to be using any crazy deps that don't support multiple platforms. Poco above is fully cross platform.

Note per apparent comments below: Lots of people develop or use macs and so they'd be interested if they'd have to have a VM or other option to use this. Since the readme is super thin and it just says Only linux xxx I felt they didn't have much info. I'm used to the days where people built projects that compiled everywhere but didn't build packages for them for some reason.

[+] fsaintjacques|9 years ago|reply

It supports Linux x86-64 with gcc >= 5.0 .

[+] waynenilsen|9 years ago|reply

I found this [1] much more informative in terms of what this is good for.

[1] https://clickhouse.yandex/reference_en.html

[+] dang|9 years ago|reply

Ok, we changed the URL to that from https://clickhouse.yandex/.

[+] prasad223|9 years ago|reply

Reference is way more better than the actual page.

[+] threeseed|9 years ago|reply

This looked really interesting and then saw this: "This is not a cross-platform system. It requires Linux Ubuntu Precise (12.04) or newer, x86_64 architecture with SSE 4.2 instruction set."

That pretty much rules out most enterprise deployments or big data appliances.

Anyone know the difficulty of getting something like this running on RedHat distros. Surely it's nothing major.

[+] tyingq|9 years ago|reply

This page seems to cover most of the prerequisites, and some other advice: https://github.com/yandex/ClickHouse/blob/master/doc/build.m...

And it does say this: "With appropriate changes, build should work on any other Linux distribution...Only x86_64 with SSE 4.2 is supported"

[+] negus|9 years ago|reply

Not sure, that it is a problem. If you consider to use such scalable system, you certanly will build a completely separate cluster for it. For such case, there is no real difference what distro to use, even if your ops team are redhat/whatever fans

[+] stuartaxelowen|9 years ago|reply

What's wrong with just running it in a container?

[+] code_research|9 years ago|reply

"Enterprise" is a code word for "we have knowledge inhouse and can do anything" and not "the products we buy restrict us to the limitations set by the company we made ourselves depend on". Free and universal thinking, however, is not found very often in companies, so not every company will grow into an "enterprise".

[+] cven714|9 years ago|reply

Interested in how this differs from kdb+, I see it isn't part of the performance comparison chart but seems to have some feature overlap.

[+] nickpeterson|9 years ago|reply

Knowing the kdb community, someone will post a benchmark full of nonsensical values in a plain text file available on an ftp site that shows kdb winning by 100000000 of some unknown unit...

[+] nfa_backward|9 years ago|reply

Looks really interesting and not another SQL on Hadoop solution. The benchmarks look impressive, but all of the queries were aggregations of a single table. I did not see any joins. I wonder how mature the optimizer is.

[+] Lethalman|9 years ago|reply

I think it's because the docs say "If there isn't enough memory, you can't run a JOIN." . While SQL on Hadoop solutions work also without enough RAM by spilling on disk. I don't think a comparison with JOINs would be fair in this case.

[+] rubiquity|9 years ago|reply

Yandex has their own TLD? :/

[+] negus|9 years ago|reply

Are you really surprised, that largest European internet company has own TLD when even Suzuki, BMW and mormons have one? https://nic.yandex

[+] b0ner_t0ner|9 years ago|reply

The googs have it: https://www.registry.google/ All you need is $$$

[+] moondowner|9 years ago|reply

I guess the community will step in and for starters will: add support for other Linux distros, and add comments in English alongside Russian in the source code.

[+] biokoda|9 years ago|reply

This looks impressive as hell.

[+] etatoby|9 years ago|reply

Wow, just wow.

I guess it's time to rewrite some backends...

[+] alexsderkach|9 years ago|reply

Another summary with examples: http://bit.ly/1Uc5uqb

[+] uniacid|9 years ago|reply

Awesome, thanks for sharing!

[+] betimsl|9 years ago|reply

Are they planning to release any libraries? HTTP interface is quite good but I think it'd be better not to have the HTTP overhead and have an direct interface.

[+] StreamBright|9 years ago|reply

Without supporting other operating systems it is hard to consider it as an alternative to anything. We have several clusters using another operating system than the one supported by ClickHouse. Unfortunately few customers are going to invest into a different platform to try out something new like this. The lower the entering bar for new tech is the better.

[+] pnathan|9 years ago|reply

> ClickHouse manages extremely large volumes of data in a stable and sustainable manner. It currently powers Yandex.Metrica, world’s second largest web analytics platform, with over 13 trillion database records and over 20 billion events a day, generating customized reports on-the-fly, directly from non-aggregated data. This system was successfully implemented at CERN’s LHCb experiment to store and process metadata on 10bn events with over 1000 attributes per event registered in 2011.

Might want to rethink that consideration. :-)

[+] betimsl|9 years ago|reply

If it runs on Ubuntu it'll run on almost any Linux distribution ;) -- Though, not sure if sse4_2 is enabled on every virtual machine.

[+] vasilia|9 years ago|reply

> Without supporting other operating systems it is hard to consider it as an alternative to anything.

They are just published source code to github. Yandex is a Russian search engine. This project is not commercial. You can clone source code from github for adding support of your operation system.

[+] tkyjonathan|9 years ago|reply

Mind Blown /puts hadoop down and reads this.

[+] k__|9 years ago|reply

When to use such a DB?

[+] RyanHamilton|9 years ago|reply

Imagine the layout on disk for both a row based and a column based database...or see this link:

http://www.timestored.com/time-series-data/what-is-a-column-...

Now imagine which areas need read when you perform a query like "average price" for all dates. In row-oriented databases we have to read over large areas, in column-oriented databases the prices are stored as one sequential region and we can read just that region. Column-oriented databases are therefore extremely quick at aggregate queries (sum, average, min, max, etc.).

Why are most databases row-oriented? I hear you ask. Imagine we want to add one row somewhere in the middle of our data for 2011-02-26, on the row oriented database no problem, column oriented we will have to move almost all the data! Lucky since we mostly deal with time series new data only appends to the end of our table.

[+] vegabook|9 years ago|reply

so this is basically Russian Cassandra? Written in C, not Java. Enough for me; will test it.

[+] flr_null|9 years ago|reply

No, it's actually not.

Cassandra is not columnar db. It means if you need all user_id from your table Cassandra will scan all your data (e.g. 10PB) on disks. But ClickHouse will only scan 1 column file (e.g. 10GB).

ClickHouse is Russian Kudu.

70 comments