top | item 42563161

(no title)

wolfgangK | 1 year ago

Amazingly thorough ! I love how the author leaves no stone unturned. I had no idea you could do the kind of low level efficiency shaving in Rust. I wonder how a C++ implementation with https://github.com/jfalcou/eve would compare.

discuss

order

curiouscoding|1 year ago

Thanks! It's somewhat tiring to not have loose ends, but I agree it pays off :)

Doing this stuff in Rust is absolutely possible, and I'd do it again since my C++ days are now past me, but the endless transmuting between portable-simd types, plain rust arrays, and intrinsic types is quite annoying.

Also rust is not made for convenient raw pointer arithmetic. There plain C would be much more to the point.

ant6n|1 year ago

You kind of lost me towards the end, so I’m not sure whether you attempted it, but I was wondering whether it would be possible for the internal nodes to store only the upper 8/16 bits (that are nonzero for the current subtree). This implies that one 64 byte cache line stores 32 or 64 entries instead of 16 (or better: 31 or 62/63, since u may need some bookkeeping dat).

The next level would be to keep track of how many prefix bits are implied by the parents, so leaf nodes could perhaps also only use 8/12/16 bits, if the the higher bits are implied by the parent. or instead of bit masks, use offsets (i.e. leaves store k bits offset from parent value).

That may screw around with the branch predictor, and may work very good with evenly distributed data vs uneven data.

rurban|1 year ago

There are stones still unturned. For typical unicode table lookup's you'd need batches for all characters of a word. It would be much faster to lookup eg 16 chars at once, to optimize cache misses.

But when I tested this even Eytzinger was too slow.