top | item 35003008

(no title)

themulticaster | 3 years ago

Well, using the identity function as a hash does not provide any avalanche effect at all. And as parent comments point out, certain hash map implementations require a strong hash function. I'm guessing that the identity function probably isn't a "good" hash function as required by abseil's swiss tables and similar hash maps, but to be honest I haven't looked into this any further. Put another way, I'd have expected std::hash to be at least a very basic hash function (like FNV-1 for example).

discuss

order

jcelerier|3 years ago

Hmm, I know that at least some hash map implementations look for hash functions with an is_avalanching trait and otherwise add an explicit combination step; I'd assume that std::unordered_map implementations do the explicit combination all the time as std::hash was pretty much introduced just to work with such containers

tialaramex|3 years ago

As you presumably know std::unordered_map is essentially obliged to be a particular flavour of hash map which I'd say values simplicity of implementation over any performance metrics but which also provides certain iterator invalidation semantics which C++ programs could in principle come to rely on (and hence it cannot be swapped for a new structure).

Turns out for this structure hash quality just doesn't matter very much. The identity function is all you need for mediocre results, and a really great hash produces only slightly improved results with this data structure, yet running the hash costs performance so why bother ?

But a modern hash map structure actually relies on that quality for reasonable performance.

https://martin.ankerl.com/2022/08/27/hashmap-bench-01/#bench...

Ask that table about std::unordered_map - notice the fancier hashes make it worse (bigger numbers are worse, relative to 100 is the best) but not that much worse. It's always poor, it just varies as to whether it's very poor.

Now, ask it about absl::flat_hash_map - for Abseil's own hashes the results range from very good to mediocre, and likewise other modern high quality hashes, but for std::hash it's so awful most benchmarks don't finish, shown as - instead of a number.

You can try to detect this, but this is a fundamentally a QoI problem. Why even provide this rubbish hash function in the standard library? This is a lesson the C++ standard library refuses to learn, if you can't do it well then don't become an attractive nuisance by doing it badly.

People are actually less unhappy with Microsoft's stdlib which gives you a poor FNV hash instead of the identity function. Like, OK, this isn't a great hash, but it does basically work (under non-adversarial conditions, FNV has big problems for tailored input) something like Swiss Tables will be substantially slower than it ought to be, but not crazy benchmark failed due to timeout slower. Nobody is cheering, but it doesn't cause open weeping.