top | item 41950029

(no title)

ryzhyk | 1 year ago

The correct way to think about the problem is in terms of evaluating joins (or any other queries) over changing datasets. And for that you need an engine designed for *incremental* processing from the ground up: algorithms, data structures, the storage layer, and of course the underlying theory. If you don't have such an engine, you're doomed to build layer of hacks, and still fail to do it well.

We've been building such an engine at Feldera (https://www.feldera.com/), and it can compute joins, aggregates, window queries, and much more fully incrementally. All you have to do is write your queries in SQL, attach your data sources (stream or batch), and watch results get incrementally updated in real-time.

discuss

d0mine|1 year ago

Is it related to Differential Dataflow / timely dataflow https://github.com/TimelyDataflow/differential-dataflow

ryzhyk|1 year ago

We have our own formal model called DBSP: https://docs.feldera.com/papers

It is indeed inspired by timely/differential, but is not exactly comparable to it. One nice property of DBSP is that the theory is very modular and allows adding new incremental operators with strong correctness guarantees, kind of LEGO brick for incremental computation. For example we have a fully incremental implementation of rolling aggregates (https://www.feldera.com/blog/rolling-aggregates), which I don't think any other system can do today.

majormajor|1 year ago

Does this offer any non-SQL programmatic interfaces or ways to do Complex Event Processing (e.g. https://www.databricks.com/glossary/complex-event-processing )? A lot of those scenarios would be tough to express in SQL.

lsuresh|1 year ago

Yes, you can write Rust UDFs with Feldera and even use the dbsp crate directly if you'd like.

qazxcvbnm|1 year ago

Hi, I’ve read the DBSP paper and it’s a really well-thought out framework; all the magic seemed so simple with the way the paper laid things out. However, the paper dealt with abelian Z-sets only, and mentioned that in your implementation, you also handle the non-abelian aspect of ordering. I was wondering if you guys have published about how did you that?

ryzhyk|1 year ago

Apologies about the confusion. We indeed only solve incremental computation for Abelian groups, and the paper is making a case that database tables can be modeled as Abelian groups using Z-sets, and all relational operators (plus aggregation, recursion, and more) can be modeled as operations on Z-sets.

unknown|1 year ago

[deleted]

benreesman|1 year ago

I was with you and thinking Postgres over and over until the second paragraph. Which isn’t to say anything bad about your product, it sounds very cool.

But i’d work in “just like Postgres”.

ryzhyk|1 year ago

Good point. The goal is indeed to be a Postgres of incremental computing: any SQL query should "just work" out of the box with good performance and standard SQL semantics. You shouldn't need a team of experts to use the tool effectively.