(no title)
OskarS | 1 month ago
Are you sure? I'm not very used to reading Rust stdlib, but this seems to be the implementation of the default HashMap extend [1]. It just calls self.base.extend. self.base seems to be hashbrown::hash_map, and this is the source for it's extend [2]. In other words, does exactly the same thing, just iterates through hash map and inserts it.
Maybe I'm misreading something going through the online docs, or Rust does the "random seed" thing that abseil does, but just blinding assuming something doesn't happen "because Rust" is a bit silly.
[1]: https://doc.rust-lang.org/src/std/collections/hash/map.rs.ht...
[2]: https://docs.rs/hashbrown/latest/src/hashbrown/map.rs.html#4...
tialaramex|1 month ago
Note that the worst case is we ate a single unneeded growth, while the best case is that we avoided N - 1 grows where N may be quite large.
attractivechaos|1 month ago
I tried extend() anyway. It didn't work well. Based on your description, extend() implements a variation of preallocation (i.e. Solution II). However, because it doesn't always reserve enough space to hold the merged hash table, clustering still happens depending on N. I have updated the rust implementation (with the help of LLM as I am not a good rust programmer). You can try it yourself with "ht-merge-rust 1 -e -n14m" or point out if I made mistakes.
> HashMap will by default be randomly seeded in Rust
Yes, so it is with Abseil. The default rust hash functions, siphash in the standard library and foldhash in hashbrown, are ~3X as slow in comparison to simple hash functions on pure insertion load. When performance matters, we will use faster hash functions at least for small keys and will need a solution from my post.
> In a new enough C++ in theory you might find the same functionality supported, but Quality of Implementation tends to be pretty frightful.
This is not necessary. The rust libraries are a port of Abseil, a C++ library. Boost is as fast as Rust. Languages/libraries should learn from each other, not fight each other.