(no title)
curiouscoding | 1 year ago
On the other hand, most of the latency is in the last few layers, and probably there isn't as much to be saved there.
The biggest problem might be that the bottom layer will anyway have to store full-width numbers, since we must be sure to have the low-order bits somewhere in case they weren't covered yet in earlier layers. Or we could have variable width encoding per node maybe (instead of per layer) but that does sound a bit iffy on the branch predictor.
In the end I guess I kinda like the simplicity and hence reliability of the 'just do the full tree and the first layers are cheap anyway' approach. Probably another factor 2 speedup is possible on specific datasets, but anything beyond this may not be reliably good on worst case inputs (like is the issue for the prefix partitioning methods).
ant6n|1 year ago
It kind of comes down to how to deal with bad distribution of values, like large differences between adjacent values. One could for example deal with this by deliberately leaving „gaps“ in the tree, like using a 59-ary node on places (make the last values in the array large, so they will never get visited). With dynamic programming, perhaps an optimal distribution of the leaf level could be had - although it requires more bookkeeping bits of one needs to have the index of the found value, not just whether the value is in the tree.
The question could also be whether it could be interesting to store delta valeues, so that 63 8-bit valeues gives a larger numeric range - this means computing some sort of cumulative sum on on the 63 values, not sure whether there is a simd instruction for that.
One more thought: if there is different ways different layers of the tree are stored, but they are always the same for each layer, the code be unrolled, with every layer having different code to deal with it.
Last thought: if the index is not important to know, just whether the value is in the set or not, one could use a bitmap as a data structure. It would require 256MB for the 32bit space, but large spans of zeros imply empty pages.