PipelineDB v0.9.9 – One More Release Until PipelineDB Is a PostgreSQL Extension

[+] craigkerstiens|8 years ago|reply

PipelineDB is pretty interesting for time-series data. It takes an approach to processing the data as it comes in, and storing aggregates or pre-aggregates over time series. I haven't followed the latest, but as of a few years ago much of the approach was similar to some research out of UC Berkeley from about 10 years ago. You can find the paper that talks about that work (TelegraphCQ CQ for continuous query) at http://db.csail.mit.edu/madden/html/TCQcidr03.pdf. Definitely an interesting read if you're into technical papers and databases.

[+] manigandham|8 years ago|reply

Druid also does this, with pre-aggregation of streaming data along predefined dimensions for very fast cube-based analytics. It's not a relational database though and is just now getting a SQL interface through Apache Calcite. http://druid.io/

Imply is a startup with a modern cloud/on-prem distribution of Druid with a built-in visualization and querying tool: https://imply.io/

[+] isoprophlex|8 years ago|reply

I didn't know the product at all, at a glance this looks amazing to be for BI/alerting on streaming time series data.

Anyone who wants to chime in on whether this has fit your requirements for time series data processing? Thanks!

[+] iaabtpbtpnn|8 years ago|reply

If it's a Postgres extension for time-series data, I wonder how it compares to TimescaleDB, which I recently discovered and have been evaluating.

[+] Fergi|8 years ago|reply

Powering real-time reporting dashboards is definitely the #1 use case we see for PipelineDB from open source users and customers of our new SaaS product powered by PipelineDB, called Stride (stride.io).

[+] merb|8 years ago|reply

well the continous view, looks also useful for a good materialization technique for some kind of searchable "view"/table. i.e. it could probably be used to built a "cheap" elasticsearch without needing to import data into another system since you could just use triggers to update the continous view.

[+] brightball|8 years ago|reply

Had never heard of this either but it does look really interesting.

[+] airstrike|8 years ago|reply

This would be absolutely perfect for the job I had in Sales Intelligence a few years ago... except we were locked into SQL Server and there was no way the powers that be would ever let us switch over to PostgreSQL.

[+] manigandham|8 years ago|reply

SQL Server 2017 has in-memory (hekaton) storage engine and columnstore indexes. Combine them both and you can do the same thing with real-time queries over the entire dataset.

[+] crudbug|8 years ago|reply

What is storage model compared to timescaledb [0]

[0] https://github.com/timescale/timescaledb

[+] Fergi|8 years ago|reply

The storage engine for PipelineDB is PostgreSQL and the output of continuous SQL queries (continuous views in PipelineDB) is stored in what are essentially incrementally updated, realtime tables. You can think of PipelineDB as very high throughput, incrementally updated materialized views, also.

see: http://docs.pipelinedb.com/continuous-views.html

[+] Rapzid|8 years ago|reply

Correct me if I'm wrong, but PipelineDB gives effective access to data in commit order right?

[+] skunkwerk|8 years ago|reply

can't wait for support on RDS!

[+] tejasmanohar|8 years ago|reply

FWIW, AWS has a whitelist of Postgres extensions you can use in RDS so that'll probably take more time, if it ever happens.

23 comments