top | item 39179062

(no title)

alamb | 2 years ago

BTW you can see a version of what an industrial strength query optimizer / execution engine looks like in Rust https://arrow.apache.org/datafusion/

(can also use it in your own projects)

It is quite similar to what is described in this post

discuss

order

chrisjc|2 years ago

Somewhat similar, see https://substrait.io/

So for example using DuckDB with the Substrait extension, if you create a table

    create table t(a int);
and then query it as in the article, you can see something similar to what is described in the article

    CALL get_substrait_json('select * from t');

    {"relations":[{"root":{"input":{"project":{"input":{"read":{"baseSchema":{"names":["a"],"struct":{"types":[{"i32":{"nullability":"NULLABILITY_NULLABLE"}}],...
DuckDB extension doesn't seem to cover any DDL operations though.

https://duckdb.org/docs/extensions/substrait

Some other related discussions and links that i've collected over the years

https://news.ycombinator.com/item?id=37415494

https://news.ycombinator.com/item?id=34233697

https://news.ycombinator.com/item?id=31981568

https://datastation.multiprocess.io/blog/2022-04-11-sql-pars...

https://tomassetti.me/parsing-sql/

Sesse__|2 years ago

After a quick look, I'm not sure if I would call this “industrial strength”. In particular, the join optimizer (typically the heart of a large-scale SQL optimizer) looks very rudimentary? And the statistics it uses have zero idea about correlation, no histograms beyond min/max…

menaerus|2 years ago

I was wondering about the same claim. However, I believe that JOIN's are a common weakness among OLAP database engines, and DataFusion is built on top of a columnar storage format - Apache Arrow.