top | item 42565837

(no title)

gorset | 1 year ago

The number of unique int32 values is not that great, and a full bitset would only be 512MB. Hard to bit a dense bitset in performance.

As a general purpose data structure, I would look at roaring bitmaps for this purpose, which has good trade-offs with great performance for most use-cases.

If only simple lookups are needed over a static set, then it's worth looking at minimal perfect hash functions (https://sux.di.unimi.it).

discuss

order

curiouscoding|1 year ago

Hmm, bitmaps is an interesting idea! If the data is dense enough, then yeah I guess a quick linear scan would work.

There's also the extreme of simply storing the answer for each possible query as a u32 and just index the array, but there the overhead is much larger.

Rank-select is also interesting, but I doubt it comes anywhere close in performance.

I happen to also be working on a minimal perfect hashing project that has way higher throughput than other methods (mostly because batching), see https://curiouscoding.nl/posts/ptrhash-paper/ and the first post linked from there.

gorset|1 year ago

Nice work! I love the care and discussion around low level optimizations, as such things are often ignored.

There are a lot of interesting variants of rank/select and other succinct data structures which has interesting tradeoffs. Maybe most useful when compression is critical. At least my own data structures can’t compete with the query performance you are showing, but they are great when the memory footprint matters.