(no title)
nulptr | 5 years ago
But, instead of using a hash table to store (seed, hash) pairs, we can just use a list of only hashes. To generate the next hash we use current hash. i.e. next_hash = hash("example.com/" + string(int64(cur_hash))). So now, we just store 60-bit hash values which takes up 2^30 * 60 bits = 8gigs.
The only other problem is detecting if a hash already exists, which I think can be done using a space efficient probabilistic hashset like a bloom filter (and when the bloom filter says the hash already exists, we do a linear scan to find where it exists and we have a possible collision).
penteract|5 years ago
nulptr|5 years ago
AI_WAIFU|5 years ago
[deleted]
Thorrez|5 years ago