top | item 37962098

(no title)

AdamProut | 2 years ago

Databricks has kept their Photon[1][2] query engine for Spark closed sourced thus far. Unless EMR has made equivalent changes to the Spark runtime they use Databricks should be much faster. Photon brings the standard vectorized execution techniques used in SQL data warehouses for many years to Spark.

[1] https://docs.databricks.com/en/clusters/photon.html [2] https://dl.acm.org/doi/10.1145/3514221.3526054

discuss

whinvik|2 years ago

I am a bit hazy about the exact details of how we did it since its been some time, but we definitely did not use Photon as it was too expensive.

One of the issues was that we started experimenting with Delta Tables and EMR was horrible in leveraging that.