top | item 46539182

(no title)

OskarS | 1 month ago

> HashMap implements Extend, so just h0.extend(h1) and you're done, the people who made your HashMap type are much better equipped to optimize this common operation.

Are you sure? I'm not very used to reading Rust stdlib, but this seems to be the implementation of the default HashMap extend [1]. It just calls self.base.extend. self.base seems to be hashbrown::hash_map, and this is the source for it's extend [2]. In other words, does exactly the same thing, just iterates through hash map and inserts it.

Maybe I'm misreading something going through the online docs, or Rust does the "random seed" thing that abseil does, but just blinding assuming something doesn't happen "because Rust" is a bit silly.

[1]: https://doc.rust-lang.org/src/std/collections/hash/map.rs.ht...

[2]: https://docs.rs/hashbrown/latest/src/hashbrown/map.rs.html#4...

discuss

order

tialaramex|1 month ago

Yes, HashMap will by default be randomly seeded in Rust, but also the code you linked intelligently reserves capacity. If h0 is empty, it reserves enough space for all of h1, and if it isn't then it reserves enough extra space for half of h1, which turns out to be a good compromise.

Note that the worst case is we ate a single unneeded growth, while the best case is that we avoided N - 1 grows where N may be quite large.

attractivechaos|1 month ago

First of all, as khuey pointed out, the current implementation accumulates values. extend() replaces values instead. It wouldn't achieve the same functionality.

I tried extend() anyway. It didn't work well. Based on your description, extend() implements a variation of preallocation (i.e. Solution II). However, because it doesn't always reserve enough space to hold the merged hash table, clustering still happens depending on N. I have updated the rust implementation (with the help of LLM as I am not a good rust programmer). You can try it yourself with "ht-merge-rust 1 -e -n14m" or point out if I made mistakes.

> HashMap will by default be randomly seeded in Rust

Yes, so it is with Abseil. The default rust hash functions, siphash in the standard library and foldhash in hashbrown, are ~3X as slow in comparison to simple hash functions on pure insertion load. When performance matters, we will use faster hash functions at least for small keys and will need a solution from my post.

> In a new enough C++ in theory you might find the same functionality supported, but Quality of Implementation tends to be pretty frightful.

This is not necessary. The rust libraries are a port of Abseil, a C++ library. Boost is as fast as Rust. Languages/libraries should learn from each other, not fight each other.