(no title)
3wolf | 3 years ago
* IP addresses actually aren't a great use case for basic bloom filters, as they're fairly storage efficient to begin with, as opposed to a url for example. Taking your example, say we need to store 1 million IP addresses in our bloom filter and we're okay with a ~1% false positive rate. Well then, if we use a bloom filter with 2^23 bits (1 MB), the optimal number of hash functions is (2^23)/(10^6)*ln(2) = 6, yielding a false positive rate of (1 - exp(-6* 10^6 /2^23))^6 = ~1.8%. So we're using 6% of the storage space, but with a nearly 2% false positive rate.
No comments yet.