Building columnar compression in a row-oriented database

[+] georgewfraser|6 years ago|reply

Reading this post is frustrating. What they are describing is where column store databases were 20 years ago. Perhaps at some point the folks at TimescaleDB will read Daniel Abadi’s 2008 paper, which describes the key elements of how all modern column stores work:

http://db.csail.mit.edu/pubs/abadi-column-stores.pdf

The key takeaway is that columnar compression only accounts for a small minority of the speed up that you get for scan-oriented workloads; the real big win comes when you implement a block-oriented query processor and pipelined execution. Of course you can’t do this by building inside the Postgres codebase, which is why every good column store is built more or less from scratch.

Anyone considering a “time series database” should first set up a modern commercial column store, partition their tables on the time column, and time their workload. For any scan-oriented workload, it will crush a row store like Timescale.

[+] mfreed|6 years ago|reply

Hi @georgewfraser: We understand this point, and you aren't the first to mention it on HN =)

But our users, and the users of most so-called "time series databases", do not typically generate the classic scan-oriented workloads of OLAP systems that motivated C-Store and classic columnar data warehouses.

I talk about this in the post: sure, there are deep and narrow queries, but there are also a /lot/ of shallow and wide queries. Most real-time dashboarding, which is a huge use-case for IT/devops monitoring and IoT sensor monitoring, is about "what happened very recently".

But you do occasionally want to look back, so you want to keep old data sitting around, just in a cost-effective manner. And there is a lot of data: time-series and monitoring applications are typically write (not read) heavy, and ingest high volumes of data continuously.

Enter compression and data tiering, both of which appeared in today's v1.5 release.

As such, the primary motivation behind our work on compression was actually saving storage overhead, not improving query performance. Although we'll never pass up on performance improvements when we get them! =)

(And we are familiar with the citation -- I've known Daniel and Sam for many years, and one of the engineers who led this work at Timescale was actually Daniel's former PhD advisee Josh Lockerman.)

[+] grumpydba|6 years ago|reply

> Anyone considering a “time series database” should first set up a modern commercial column store, partition their tables on the time column, and time their workload. For any scan-oriented workload, it will crush a row store like Timescale.

Or you can set up a clickhouse instance. It's a seriously promising and underrated product.

[+] mamcx|6 years ago|reply

I think this is fair. The whole selling point is to have time series abilities inside of PG.

It will not be as fast as a tailored engine? Sure. But the advantage of stay inside of PG is very great (not only ecosystem, but have sql, OLTP + OLAP, etc).

Is like how a document store could be better than add JSON with PG...

In practique? I think this hybrid approach is very good.

P.D: And what about the opposite? Which columnar store can I use for OLTP?

[+] manigandham|6 years ago|reply

True, but this is a major step up from what Timescale offered before. This was always my biggest contention since performance lagged behind many other datastores but they have done well in taking advantage of the Postgres integration and ecosystem.

There are plenty of scenarios that benefit from having automatic time-based partitioning and querying within the same operational PG datastore instead of running a separate analytical system. The improved performance only helps this case.

[+] m0zg|6 years ago|reply

Um, no. Columnar compression can be huge if your data can be sorted favorably. You basically get to filter (both compare and skip) and aggregate without fully decompressing. If you have to scan a column (or a few), this can be huge. It's great if you can skip blocks within the column or entire shards, but that's unlikely to happen unless you're doing a highly selective query. OLAP queries aggregate a ton of data. Judiciously chosen column compression scheme coupled with a good sort order can be a huge, huge advantage.

[+] GordonS|6 years ago|reply

Of anyone from TimescaleDB is here: I run a (very) small business using TimescaleDB in production, but don't need any enterprise features and so haven't purchased a commercial license. But I love it!

Do you have a Patreon, PayPal account, or any other means to receive money as a donation, gift or token of appreciation?

[+] mfreed|6 years ago|reply

No, but we always love user stories! Shoot me email at mike (at) timescale.com

[+] akulkarni|6 years ago|reply

Thank you for the <3

And yes as mfreed already mentioned we love hearing user stories :)

[+] dgudkov|6 years ago|reply

For anyone interested in columnar compression and columnar databases I highly recommend reading Vertica's Technical Overview White Paper [1]. I don't know how Vertica is doing now, but it's a beautifully designed columnar database with many interesting concepts which influenced a lot the product I'm currently working on.

[1] https://pdfs.semanticscholar.org/0cb4/d21ca5198819a85adae8ea...

[+] cevian|6 years ago|reply

Hey Timescale engineer here. Happy to answer any questions.

[+] benwilson-512|6 years ago|reply

Hey! We're looking to evaluate TimescaleDB for a logistics IoT scenario. Some of the data that enters our system comes from connected devices where recorded_at and inserted_at columns are basically the same. Some data however is sourced from dataloggers that may record for months before the data arrives at our system.

With TimescaleDB, would I use the recorded_at or inserted_at column for the hypertable?

Does this change if data for an individual sensor can sometimes arrive out of order? If the sensor malfunctions and the data contains timestamps in the far past or the far future does this cause issues with TimescaleDB?

What we've done in postgres so far is have the tables with data generally structured around the recorded_at column because most analysis wants to look at the data "in order" . to generate reports, graphs, etc. Each data row also contains a "payload_id" relating it to a "payloads" table which helps group data by when it actually hit the system. Data processing has generally been built around the payloads and then query any additional data in recorded_at order on the main data tables if we need to look back or forward in time.

[+] athenot|6 years ago|reply

I generally like the direction y'all are going.

Question: how does the new compression compare to TokuDB, and is it tuneable for performance/size tradeoff?

https://www.percona.com/doc/percona-server/5.7/tokudb/using_...

[+] andrewg|6 years ago|reply

Very cool! The effect on page layout sounds like it would be pretty similar to Oracle's hybrid columnar compression[1], but they claim average compression ratio is more like 10:1. Any idea what would make so much of a difference?

[1] https://www.oracle.com/technetwork/database/exadata/ehcc-twp...

[+] pella|6 years ago|reply

Congratulations!

* What is a plan for PG12 PLUGGABLE STORAGE[1]? this is for PG12?

* Can you compare with Zedstore[2]?

[1] https://www.postgresql.org/docs/devel/tableam.html

[2] https://github.com/greenplum-db/postgres/tree/zedstore

[+] mamcx|6 years ago|reply

I'm building a relational lang (that could feel like a in memory db) and explored the idea of using a columnar backed structures.

How well could be apply the same ideas for in-memory processing?

If I understand correctly, you have something alike:

- Store each column on a array of N=1000 - Store the group of columns in pages, with metadata of ranges of keys to locate rows in the adequate page

[+] jeltz|6 years ago|reply

Are there any plans to collaborate with the PostgreSQL project? Especially on Andres' work on speeding up the executor? My apologies if you already are and I have just not seen your names on the mailing list.

[+] fabian2k|6 years ago|reply

I really like Postgres as an allround database, and putting monitoring data into Postgres is an attractive idea to me. One thing I'm wondering is how well you can mix a regular Postgres database with timeseries data using TimescaleDB? If I have a regular application using Postgres, can I just add timeseries in there with TimescaleDB in the same database? Or is that either not possible, or simply a bad idea?

[+] RobAtticus|6 years ago|reply

You definitely can mix the two, and we actually find it a very useful thing so you can have your metadata available to query alongside the time series data. TimescaleDB will only make a table a hypertable if you tell it too, otherwise regular tables should work as you'd expect. And since we appear as a table in the contexts that it matters (JOINs, querying, inserting, etc), you shouldn't have any issues mixing the two.

[+] GordonS|6 years ago|reply

I do this, and it works just fine - no caveats, it's actually one of the selling points of TimescaleDB.

[+] jeltz|6 years ago|reply

Yes you can, and I think this is one of the main selling points of TimescaleDB.

[+] dedsm|6 years ago|reply

it's perfectly fine to use TimescaleDB alongside any regular tables.

[+] stalller|6 years ago|reply

It is encouraged! :)

[+] valyala|6 years ago|reply

This is great news! We at VictoriaMetrics [1] are happy that TimescaleDB eventually addressed high disk usage issue and implemented type-aware column-based compression. This should improve its' position and performance in TSBS benchmark [2].

One question: why do you use Gorilla compression for floating-point values? It works well for integer values, but is pretty useless for floating-point values [3].

[1] https://github.com/VictoriaMetrics/VictoriaMetrics

[2] https://medium.com/@valyala/measuring-vertical-scalability-f...

[3] https://medium.com/faun/victoriametrics-achieving-better-com...

[+] mfreed|6 years ago|reply

Gorilla is one of the few simple, efficent, _lossless_ algorithms that compress binary data.

Tricks such as renormalizing floats in base-10 (which I believe VictoriaMetrics implements) while great when they work, potentially truncates lower-order digits.

Lossiness is not a decision we're willing to make for our users unilaterally. And, as Gorilla is one the default compression algorithms we employ, it needs to be one that is correct in all cases.

Incidentally, we experimented with the binary equivalent of the renormalization algorithm, and it performed on-par with Gorilla in our tests.

[+] 1996|6 years ago|reply

Too bad pipelinedb died. It was a nice complement to timescaledb -- with small oddities that sometimes took a production database down if you had no experience of advanced vacuum / repack features.

Any plan to bring back something like continuous views in timescaledb? (or to integrate pipelinedb work)

[+] mfreed|6 years ago|reply

Continuous aggregates were first introduced in v1.3. Not identical to PipelineDB features, but address some of the very same use cases. We have some cool new features for continuous aggs planned too.

Tutorial: https://docs.timescale.com/latest/using-timescaledb/continuo...

Example with Grafana dashboarding: https://blog.timescale.com/blog/how-to-quickly-build-dashboa...

Technical details of correctly handling backfilled data: https://blog.timescale.com/blog/continuous-aggregates-faster...

[+] powturbo|6 years ago|reply

- Timestamps are very easy to compress down to 0.01%

- Floats are hard to compress without quantization or lossy compression. The Gorilla algorithm is not helping so much.

You can do your own benchmarks with your data with TurboPFor : https://github.com/powturbo/TurboPFor

or download icapp including more powerfull algorithms.

[+] m0zg|6 years ago|reply

Nonrandom time series floats are often significantly helped by delta-compression before applying the actual block compression algorithm. I used to do this quite successfully in late 90s when sending FOREX quotes to clients over analog modem connections.

[+] jnordwick|6 years ago|reply

Instead of trying to hack column like features and performance onto a row oriented structure, why not just develop a column storage for hypertables? All this work seems like a dead end performance-wise.

[+] cevian|6 years ago|reply

The article covers this in more depth, but the short answer is: The hybrid approach has benefits when your workload has both shallow-and-wide (fetch all data for user X) and deep-and-narrow queries (calculate the average number of logins for all users). Row stores excel on shallow queries, column stores excel on deep queries. For time-series data you want row-oriented structures when the data just comes in and convert it to a column store after it ages.

[+] mfreed|6 years ago|reply

It's also the case that, at least for Postgres, building a new low-level storage format isn't feasible until PG13.

And also this approach allows us to leverage alot of the battle-tested stability built into current storage layout with TOASTed pages, without starting from scratch.

[+] vkaku|6 years ago|reply

+10 for moar compression!

55 comments