(no title)
kvnkho | 2 years ago
The Ibis integration is more about accessing data in various data stores already. For example, we use it under the hood also for our recently released BigQuery integration: https://fugue-tutorials.readthedocs.io/tutorials/integration...
On to differences:
1. We guarantee consistency between backends. NULL handling can be different depending on the backend. For example, Pandas joins NULL with NULL while Spark doesn't. So if you prototype locally on Pandas, and then scale to Spark, we guarantee same results. Fugue is 100% unit tested and the backends go through the same test suite.
2. Ibis is Pythonic for SQL backends. We embrace SQL, but understand its limitations. FugueSQL is an enhanced SQL dialect that can invoke Python code. FugueSQL can be the first-class grammar instead of being sandwiched by Python code. Fugue's Python API and SQL API are 1:1 in capability.
3. Opinionated here, but we don't want users to learn any new language. Ibis is a new way to express things; we just want to extend the capabilities of what people already know (SQL, native Python, and Pandas). Fugue can also be incrementally adopted, meaning it can be used for just one portion of your workflow.
4. Roadmap-wise, we think the optimal solutions will be a mix of different tools. A clear one is pre-aggregating data with DuckDB, and then using Pandas for further processing. Similarly, can we preprocess in Snowflake and do machine learning in Spark? Fugue is working on connecting these different systems to enable cross-platform workloads.
There may be more information for you here: https://fugue-tutorials.readthedocs.io/tutorials/integration...
chrisjc|2 years ago
goodwanghan|2 years ago