top | item 34189102

(no title)

Dylan1312 | 3 years ago

Bloom filters are a fantastic datastructure but more applicable when the search set is large (usually greater than can fit in-memory).

If using a bloom filter with a small set, it's possible to obtain a low probability of false positives by using just one hash function and a small number of buckets. At that point you've effectively got a hashset like one of the solutions described in the blog post.

discuss

moloch-hai|3 years ago

The idea was that you maintain a running hash as bytes arrive, and drop out on the first that doesn't match.

A separate hash table for what gets past would check for false positives, using the final hash value.