top | item 41851789

(no title)

durner | 1 year ago

The idea of comparing column and row storage actually inspired this whole post.

discuss

order

tomnipotent|1 year ago

Your read performance test is biased towards a struct of arrays, array of structs should outperform when needing random non-contigious look-ups. In the context of fixed-page databases, this is an important distinction since row-based and hybrid storage (PAX) will need to read fewer pages than a pure columnar store.

durner|1 year ago

Sure, a filtering scan or an index lookup is better in chunked SoA (or PAX as we database people say) than without chunks due to the metadata filter options. We briefly talk about that in the out-of-memory optimization section. Most column-based formats/databases are actually inspired by the ideas of PAX, they often just use a bit coarser granularity (e.g. Parquet's rowgroups).