top | item 21183411

Abnormal String Hashing

52 points| r4um | 6 years ago |pzemtsov.github.io

15 comments

order
[+] hinkley|6 years ago|reply
Do all of those lookup tables for powers of 31 work once integer overflow happens?
[+] tom_mellior|6 years ago|reply
What lookup tables for powers of 31 do you mean?

The HashMap implementation internally uses lookup tables (arrays) with sizes that are powers of 2, which might be what you mean? The incoming hash code can happily use all 32 bits of the int type, and its calculation is allowed to overflow and wrap. When indexing into the hash bucket array, HashMap masks off the extra bits that would lead to an index out of bounds exception otherwise.

[+] rurban|6 years ago|reply
He really didn't understand Java's take on this. Being zero-insensitive obviously is totally insecure. Java knew that. But Java decided to fight those kind of attacks better than most others. Java has a still trivial insecure hash function, which it decided to keep, because of an API blunder. But they convert the collisions from a linked list to a tree on too many collisions which indicate an active attack. Those attacks are rare, the common case is still fast.

Zero-insensitivity would have been fixable trivially, perl fixed that with 5.18, but they couldn't, so they came up with a proper and much better fix. Unlike perl and everyone else.

[+] fanf2|6 years ago|reply
This isn’t about hash collisions, it’s about unnecessarily recalculating the hash value
[+] hinkley|6 years ago|reply
I'm looking at all of the lookup tables in that code and wondering how much slower it would be to do a depth-first search and calculate as you went.

And then with a trie thrown in.

[+] jepcommenter|6 years ago|reply
Arbitrary length string of null characters also produces zero hash, e.g.: System.out.println("\0\0\0\0\0".hashCode());