(no title)
eduren | 4 years ago
I spent some time evaluating Arc for my team's ETL purposes and I was really impressed. I hesitated somewhat to move forward with it because it seemed really tied into the Spark ecosystem (for great reasons). We just weren't at all familiar with deploying and operating Spark, so ended up rolling our own scripts on top of (an existing) Airflow cluster for now.
Besides performance reasons, are there any other advantages to porting Arc to run on top of datafusion? If the porting effort was shared somewhere I'd love to dig in and see what the proof-of-concept looks like.
seddonm1|4 years ago
FridgeSeal|4 years ago
Currently using DataFusion from Rust, and being more resource efficient means we can use smaller machines, which means our costs go down. Deploying services is also faster (smaller docker images, faster startup times) and puts less extraneous load on our machines.
I imagine Arc, and thus downstream users, would see similar benefits.