top | item 33147030

(no title)

omegalulw | 3 years ago

Maybe I'm missing something obvious, but if I insert all 32 bit keys, does that not take 32*2^32 space?

discuss

order

desertrider12|3 years ago

Nope, there will just be 2^8 tables (constant overhead per table) each with 2^24 24-bit keys.

If you start with an array of any N keys, they have 32N bits of information. But once you insert them all into an unordered set like this, you throw away information by losing the order. There are only (2^32 choose N) unique sets of N keys, which is less than N*2^32 for N>1. This checks out in your example - there is only one unique set of all the 32-bit integers, so that should take log2(2^32 choose 2^32) = 0 bits to store sets of this size.

fargle|3 years ago

it's not mean for storing all possible 32 bit integers. if you want to do that, it would take zero storage because the answer is just "yes".

what they are doing is using the fact that a hash table can be loaded to 80% or 90% and still work fine. but they can encode part of the data (the first byte) implicitly in the hashed address. it's kind of cheating because the user provides those bits back to the lookup function to get their answer, so they don't need to be stored. and saving one byte out of four means you store 75% which is a bigger gain than the loss due to the 80%-90% load-factor.

say you have a normal hash table with 90% load factor, and you need to store 100,000 32 bit words, it would take 32 * 100,000 / 0.9 = 3555555.

but if you use this (or some related) trick and only save part of the key, you'd have 24 * 100,000 / 0.9 = 2666666. which is even less than 32 * 100,000 if you used direct storage in a simple array.