top | item 39179062

(no title)

alamb | 2 years ago

BTW you can see a version of what an industrial strength query optimizer / execution engine looks like in Rust https://arrow.apache.org/datafusion/

(can also use it in your own projects)

It is quite similar to what is described in this post

discuss

chrisjc|2 years ago

Somewhat similar, see https://substrait.io/

So for example using DuckDB with the Substrait extension, if you create a table

    create table t(a int);

and then query it as in the article, you can see something similar to what is described in the article

    CALL get_substrait_json('select * from t');

    {"relations":[{"root":{"input":{"project":{"input":{"read":{"baseSchema":{"names":["a"],"struct":{"types":[{"i32":{"nullability":"NULLABILITY_NULLABLE"}}],...

DuckDB extension doesn't seem to cover any DDL operations though.

https://duckdb.org/docs/extensions/substrait

Some other related discussions and links that i've collected over the years

https://news.ycombinator.com/item?id=37415494

https://news.ycombinator.com/item?id=34233697

https://news.ycombinator.com/item?id=31981568

https://datastation.multiprocess.io/blog/2022-04-11-sql-pars...

https://tomassetti.me/parsing-sql/

Sesse__|2 years ago

After a quick look, I'm not sure if I would call this “industrial strength”. In particular, the join optimizer (typically the heart of a large-scale SQL optimizer) looks very rudimentary? And the statistics it uses have zero idea about correlation, no histograms beyond min/max…

menaerus|2 years ago

I was wondering about the same claim. However, I believe that JOIN's are a common weakness among OLAP database engines, and DataFusion is built on top of a columnar storage format - Apache Arrow.

biggestdummy|2 years ago

There's also this pretty detailed article on StarRocks' query optimizer. (StarRocks is open source - focused on OLAP)

https://medium.com/starrocks-engineering/starrocks-inside-sc...

muizelaar|2 years ago

Specifically: https://github.com/apache/arrow-datafusion/tree/main/datafus...