top | item 43706461

(no title)

jlebar | 10 months ago

I think the author has misunderstood things here.

This technique is orthogonal to integer mod. Indeed the author multiplies by their magic constant and then does an integer mod to map into their hashtable's buckets.

This technique is actually just applying a fast integer hash on the input keys to the hashtable before mapping the keys to buckets. You can then map to buckets however you want.

The additional hash is useful if and only if the input hash function for your table's keys doesn't appear to be a random function, i.e. it doesn't mix its bits for whatever reason. If your input hash functions are indeed random then this is a (small but perhaps measurable) waste of time.

Using prime-numbered table sizes is another way to accomplish basically the same thing. Dividing the input hash key by a prime forces you to look at all the bits of the input. In practice these are written as division by a constant, so they use multiplies and shifts. It's basically a hash function. (Though I'd use multiply by a magic number over divide by a prime, mul alone should be faster.)

Relatedly see this post by Daniel Lemire about an alternative to integer mod, https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-... which is interesting if your number of buckets is not a power of 2 for some reason.

discuss

sdenton4|10 months ago

I think the post talks about exactly this? The method is combining hashing the keys and finding a position in the target range. There's a bit where he talks about how Knuth uses the term 'hash function' as the combination of these two operations, while modern texts look at the two operations in isolation.

So maybe one way of looking at this is as an efficient fusion operation, which doesn't look special when you look at the ops in isolation, but combine to something that is both fast and advised problems with input patterns.

ithkuil|10 months ago

The way I understood this article, the problem Fibonacci hashing seems to solve is that it turns a hashing strategy that would require a prime modulo into something that can use a power of two modulo.

I think there are some hashing functions around that are already designed to solve that problem at "step 1".

So the question just boils down to which is faster