top | item 41177331

(no title)

lor_louis | 1 year ago

I did not know about Eytzing-sorting, I'll look into it after my vacation, thanks! And yeah our current system is column oriented, and I already tried most of your recommendations (including tries) but the biggest limitation we face is that different kinds of queries will be better served by different kinds of data structures but you can only store a few of them on disk before the cost to index becomes too big.

discuss

order

o11c|1 year ago

Whoops, I dropped the "er" the second time. I've also seen it labeled "cache-friendly binary sort", and sometimes the Breadth-First-Search aspect is played up (but that's unsearchable).

Logical-order (in-order DFS) iteration is obvious and trivial (though a lot of tasks are fine with physical-order (BFS) iteration. Beware that some tasks that nominally don't care about input order perform much better on already-sorted input though).

Conversion between logical index and physical index (if you need it - chances are you don't) can still be done efficiently, but requires some (fun!) thought especially for the non-power-of-2 tail.

Explicit prefetching is trivial to arrange for k-great grandchildren if needed (though the width of course grows exponentially). Using an n-ary rather than binary tree is possible but I'm not sure of the cache effects in practice. Switching to linear search just at the last level may be worth it? All of this means more complex logic though.