top | item 34461724

(no title)

solidangle | 3 years ago

But you don't have to write map-reduce jobs at all? You can just write SQL queries or Pandas programs, and they automatically get parallelized by Databricks. Databricks is a data warehouse (just like Snowflake).

https://www.databricks.com/product/databricks-sql

discuss

order

legerdemain|3 years ago

In a twist, pandas programs don't get parallelized on Spark. Someone had to go and write a parallel layer that duplicated the pandas API, because otherwise you ended up with the entire pandas program executing on a single executor.

alexott|3 years ago

there is Pandas on Spark, included into Spark itself (originally Koalas) - the switch to it is very easy, and you get parallelization.