I am one of the contributors of Fugue. Fugue is an open-source abstraction layer that ports Python/Pandas/SQL code to Spark or Dask. This article covers the programming interface and benefits Fugue provides, specifically:
* Handling inconsistent behavior between different compute frameworks (Pandas, Spark, and Dask)
* Allowing reusability of code across Pandas-sized and Spark-sized data
* Dramatically speeding up testing and and iteration cycles
* Enabling new users to be productive with Spark much faster
* Providing a SQL interface capable of handling end-to-end workflows
There was a previous post on here about our SQL interface that lets you use SQL on top of Pandas, Spark and Dask. This post talks about the broader project.
https://news.ycombinator.com/item?id=28830243
kvnkho|4 years ago
I am one of the contributors of Fugue. Fugue is an open-source abstraction layer that ports Python/Pandas/SQL code to Spark or Dask. This article covers the programming interface and benefits Fugue provides, specifically:
* Handling inconsistent behavior between different compute frameworks (Pandas, Spark, and Dask) * Allowing reusability of code across Pandas-sized and Spark-sized data * Dramatically speeding up testing and and iteration cycles * Enabling new users to be productive with Spark much faster * Providing a SQL interface capable of handling end-to-end workflows
There was a previous post on here about our SQL interface that lets you use SQL on top of Pandas, Spark and Dask. This post talks about the broader project. https://news.ycombinator.com/item?id=28830243
Our repo can be found here: https://github.com/fugue-project/fugue
Happy to answer any questions!