After a quick look, I'm not sure if I would call this “industrial strength”. In particular, the join optimizer (typically the heart of a large-scale SQL optimizer) looks very rudimentary? And the statistics it uses have zero idea about correlation, no histograms beyond min/max…
I was wondering about the same claim. However, I believe that JOIN's are a common weakness among OLAP database engines, and DataFusion is built on top of a columnar storage format - Apache Arrow.
chrisjc|2 years ago
So for example using DuckDB with the Substrait extension, if you create a table
and then query it as in the article, you can see something similar to what is described in the article DuckDB extension doesn't seem to cover any DDL operations though.https://duckdb.org/docs/extensions/substrait
Some other related discussions and links that i've collected over the years
https://news.ycombinator.com/item?id=37415494
https://news.ycombinator.com/item?id=34233697
https://news.ycombinator.com/item?id=31981568
https://datastation.multiprocess.io/blog/2022-04-11-sql-pars...
https://tomassetti.me/parsing-sql/
Sesse__|2 years ago
menaerus|2 years ago
biggestdummy|2 years ago
https://medium.com/starrocks-engineering/starrocks-inside-sc...
muizelaar|2 years ago