top | item 16925499

PipelineDB v0.9.9 – One More Release Until PipelineDB Is a PostgreSQL Extension

122 points| Fergi | 8 years ago |pipelinedb.com | reply

23 comments

order
[+] craigkerstiens|8 years ago|reply
PipelineDB is pretty interesting for time-series data. It takes an approach to processing the data as it comes in, and storing aggregates or pre-aggregates over time series. I haven't followed the latest, but as of a few years ago much of the approach was similar to some research out of UC Berkeley from about 10 years ago. You can find the paper that talks about that work (TelegraphCQ CQ for continuous query) at http://db.csail.mit.edu/madden/html/TCQcidr03.pdf. Definitely an interesting read if you're into technical papers and databases.
[+] manigandham|8 years ago|reply
Druid also does this, with pre-aggregation of streaming data along predefined dimensions for very fast cube-based analytics. It's not a relational database though and is just now getting a SQL interface through Apache Calcite. http://druid.io/

Imply is a startup with a modern cloud/on-prem distribution of Druid with a built-in visualization and querying tool: https://imply.io/

[+] isoprophlex|8 years ago|reply
I didn't know the product at all, at a glance this looks amazing to be for BI/alerting on streaming time series data.

Anyone who wants to chime in on whether this has fit your requirements for time series data processing? Thanks!

[+] iaabtpbtpnn|8 years ago|reply
If it's a Postgres extension for time-series data, I wonder how it compares to TimescaleDB, which I recently discovered and have been evaluating.
[+] Fergi|8 years ago|reply
Powering real-time reporting dashboards is definitely the #1 use case we see for PipelineDB from open source users and customers of our new SaaS product powered by PipelineDB, called Stride (stride.io).
[+] merb|8 years ago|reply
well the continous view, looks also useful for a good materialization technique for some kind of searchable "view"/table. i.e. it could probably be used to built a "cheap" elasticsearch without needing to import data into another system since you could just use triggers to update the continous view.
[+] brightball|8 years ago|reply
Had never heard of this either but it does look really interesting.
[+] airstrike|8 years ago|reply
This would be absolutely perfect for the job I had in Sales Intelligence a few years ago... except we were locked into SQL Server and there was no way the powers that be would ever let us switch over to PostgreSQL.
[+] manigandham|8 years ago|reply
SQL Server 2017 has in-memory (hekaton) storage engine and columnstore indexes. Combine them both and you can do the same thing with real-time queries over the entire dataset.
[+] crudbug|8 years ago|reply
What is storage model compared to timescaledb [0]

[0] https://github.com/timescale/timescaledb

[+] Fergi|8 years ago|reply
The storage engine for PipelineDB is PostgreSQL and the output of continuous SQL queries (continuous views in PipelineDB) is stored in what are essentially incrementally updated, realtime tables. You can think of PipelineDB as very high throughput, incrementally updated materialized views, also.

see: http://docs.pipelinedb.com/continuous-views.html

[+] Rapzid|8 years ago|reply
Correct me if I'm wrong, but PipelineDB gives effective access to data in commit order right?
[+] skunkwerk|8 years ago|reply
can't wait for support on RDS!
[+] tejasmanohar|8 years ago|reply
FWIW, AWS has a whitelist of Postgres extensions you can use in RDS so that'll probably take more time, if it ever happens.