top | item 35344435

(no title)

kvnkho | 2 years ago

Hi antman, thanks for the question. I will type some points on differences, but will answer the first question. The Fugue -> Ibis -> DuckDB example is a bit weird. Yes it can be done but it's not practical (as you can tell). There may be some overlap sometime, but I do think the projects differ in scope (more below).

The Ibis integration is more about accessing data in various data stores already. For example, we use it under the hood also for our recently released BigQuery integration: https://fugue-tutorials.readthedocs.io/tutorials/integration...

On to differences:

1. We guarantee consistency between backends. NULL handling can be different depending on the backend. For example, Pandas joins NULL with NULL while Spark doesn't. So if you prototype locally on Pandas, and then scale to Spark, we guarantee same results. Fugue is 100% unit tested and the backends go through the same test suite.

2. Ibis is Pythonic for SQL backends. We embrace SQL, but understand its limitations. FugueSQL is an enhanced SQL dialect that can invoke Python code. FugueSQL can be the first-class grammar instead of being sandwiched by Python code. Fugue's Python API and SQL API are 1:1 in capability.

3. Opinionated here, but we don't want users to learn any new language. Ibis is a new way to express things; we just want to extend the capabilities of what people already know (SQL, native Python, and Pandas). Fugue can also be incrementally adopted, meaning it can be used for just one portion of your workflow.

4. Roadmap-wise, we think the optimal solutions will be a mix of different tools. A clear one is pre-aggregating data with DuckDB, and then using Pandas for further processing. Similarly, can we preprocess in Snowflake and do machine learning in Spark? Fugue is working on connecting these different systems to enable cross-platform workloads.

There may be more information for you here: https://fugue-tutorials.readthedocs.io/tutorials/integration...

discuss

chrisjc|2 years ago

Are you suggesting that Snowflake compatibility/integration is coming in the future? If so, do you plan to integrate Snowpark (Snowflake's custom DataFrame APIs) into the mix?

goodwanghan|2 years ago

Yes, we plan to integrate with Snowflake and Snowpark.