top | item 39245381

(no title)

mfreed | 2 years ago

Check out how TimescaleDB adds columnar compression to PostgreSQL, typically saving 95% of storage overhead:

https://www.timescale.com/blog/building-columnar-compression...

discuss

tbragin|2 years ago

However if you really want to optimize data currently residing in Postgres for analytical workloads, as the original comment suggests - consider moving to a dedicated OLAP DB like ClickHouse.

See results from Gitlab benchmarking ClickHouse vs TimescaleDB: https://gitlab.com/gitlab-org/incubation-engineering/apm/apm...

Key findings:

* ClickHouse has a much smaller data volume footprint in all cases by almost a factor of 10.

* There are very few ClickHouse queries that have >1s latency at q95. TimescaleDB has multiple >1s latencies, including a few in the range of 15-25s.

Disclaimer: I work at ClickHouse

osigurdson|2 years ago

What we ended up doing is maintain meta-data in Postgres but time series data is stored in ClickHouse. Thanks for making / working on ClickHouse. I appreciate it very much.

mfreed|2 years ago

That PoC benchmark didn't turn on Timescale's columnar compression, which every real deployment uses. So misleading at best.

(Timescaler)

osigurdson|2 years ago

TimeScale was certainly the first choice as we were already using Postgres. However, we could not get it to perform well as times are simulated / non monotonic. We also ultimately need to be able to manage low trillions of points in the long run. InfluxDB was also evaluated but faced a number of issues as well (though I am certain both it and TimeScale would work fine for some use cases).

I think perhaps because ClickHouse is a little more general purpose, it was easier to map our use case to it. Also, one thing I appreciate about ClickHouse is it doesn't feel like a black box - once you understand the data model it is very easy to reason about what will work and what will not.

out_of_protocol|2 years ago

Did you look at something Parquet-based? Different approach, could work on very large time-series-like datasets. E.g. snowflake, Apache Iceberg