top | item 22820751

(no title)

I looked at the data flow paradigm a couple of years ago. Back then I thought that the difference to just a "ordinary" functions is not that big, and for performance (which is important for my data work), you do not want to deviate from the traditional way too much.

Anyone felt the same or can provide a real-world problem, where data flow is actually working better that other solutions?

discuss

FridgeSeal|5 years ago

There’s this: https://github.com/mit-pdos/noria

It’s like a cache, except it keeps itself in sync with the database automatically and generates “materialised views” using data-flows based on the queries that get asked of it and will automatically generate new ones if someone makes a query it doesn’t already have a data flow for. Parts of data flows can also be shared across views.

The paper linked in the github goes into detail about the performance gains, but it easily outperforms straight database calls and caching setups.

j-pb|5 years ago

You should watch the video. It's not really about the dataflow programming paradigm itself.

This is about timely dataflow, the foundation of differential dataflow. It allows for the efficient incremental computation of results.

It basically solves the entire view maintenance problem from databases in a very elegant and efficient way.

BubRoss|5 years ago

If your functions are transforming chunks of data into other formats/types, you are already doing what data flow graphs are doing. Generalizing can give much more structured concurrency.