top | item 46724009

(no title)

thomasmg | 1 month ago

There are many variants. It really depends on what features you need. Cuckoo filters were mentioned. If you want to add and remove entries and the regular counting Bloom filter are not good enough, I would look at the "succinct counting blocked Bloom filter" [1]: they only need about twice the space than regular Bloom filters. Sure, cuckoo filters need less memory, but they can fail basically any time, while these can not.

Tiny filters: Some time ago I worked on tiny statistics [2]. This includes some 64-bit HyperLogLog implementations; some use linear counting, which is basically a 64-bit Bloom filter, until some limit, and only then switch to HyperLogLog. This is great for distinct counts of columns in databases (cardinality estimation). This project also includes 64-bit approximate counts and histograms.

[1] https://github.com/FastFilter/fastfilter_java/blob/master/fa... [2] https://github.com/thomasmueller/tinyStats

discuss

order

Sesse__|1 month ago

FWIW, I found https://github.com/FastFilter/fastfilter_java/issues/28 a pretty good explanation of what's going on in the succinct counting blocked Bloom filters. (I'm not sure if the blocking is needed for the normal Bloom filter part, though, i.e., all bits don't necessarily need to fall within the same block, no? But the counts are stored separately for each block.)

thomasmg|1 month ago

Yes, non-blocked is also possible. This would need a bit less space, and would be a bit slower. The counts > 1 (per bit that is set) are stored spearately, yes.