top | item 41046375

(no title)

jgraettinger1 | 1 year ago

The first place I'd look is using binary search over a sorted `Vec<(u32, u32)>`. The `superslice` crate is handy for this, or just using the built-in binary search methods.

The improved cache behavior alone can make an order-of-magnitude difference. The cost you pay, of course, is on inserts and removals, but perhaps these can be ammortized in a read-heavy workload (build up a list of changes, and then rebuild in one go).

discuss

order

grovesNL|1 year ago

Thanks, I had a similar idea and tried this with Meta's `sorted_vector_map` (https://github.com/facebookexperimental/rust-shed/tree/main/...) but found the performance to be slightly worse than a `BTreeMap<u32, u32>` for my test case. I did try to change it a bit to do fewer binary searches (`range` tried to find the end immediately) but the binary searches seemed to be slightly worse than finding starting intervals in the map.

dgb23|1 year ago

Honest questions (this might be over my head...):

My assumption (is it correct?) that sorted Vec<(u32, u32)> represents start/end times in a tuple?

What comes to mind at first is always storing less information. Do you _need_ 32bits? What is the granularity of your time intervals?

Then, it might make sense to encode intervals differently if you have a lot of adjacent slots:

Instead of using start/end times, you might be able to use just a single time and tag it. The tag would tell you whether you're looking at the next (adjacent) start time, or at an end time (rest of the slots are free until the next time).

That could be encoded as an enum like so: Start(u32) | End(u32). I assume that this would take up a little bit of memory for the tag and then 32bits + padding for the rest. I'm not familiar enough with Rust, but I assume you end up with at least 40 bits unfortunately because of alignment.

Another approach could be to use a i32, you only get half of the granularity and would have to do a little bit of bit twiddling but you have a much more compressed data structure.

(Aside: In Zig this might be a bit easier because you can have arbitrarily sized integers).

You said it's a read heavy use case so it might be useful to only having to search for an end tag (or "negative" number) in order to get free slots.

Another, slightly more radical, approach would be to only concern yourself with availability, flipping the problem on its head, or at least have a separate availability data structure.

This could make some write operations a bit more involved, but you'd have a data structure that specifically cares about finding free slots.

wormlord|1 year ago

I implemented a toy market ledger in Rust. I initially thought to use a B-Tree because that's the conventional wisdom right? It was sooooo slow. Running valgrind showed that 99.5% of the time was spent doing memory allocation and traversing pointers.

A hashmap of vectors was so stupidly fast. If you keep them sorted each insertion is dirt cheap, and binary search is dirt cheap.

grovesNL|1 year ago

Do you mean time was used as the key for the hashmap? What was the vector used for in that setup?