top | item 39327365

Count–Min Sketch

2 points| deborahjacob | 2 years ago |en.wikipedia.org

2 comments

order

two_handfuls|2 years ago

How does this compare to HLL?

deborahjacob|2 years ago

Count–Min Sketch requires each input to pass through multiple independent hash functions, which is computationally expensive. The workaround is to use a single hash function but use part of its output to split values into one of many buckets hence, simulating a situation in which we had m different hash functions. This costs nothing in terms of accuracy but saves computing many independent hash functions. This procedure is called stochastic averaging and has a predictable bias towards larger estimates. Durand-Flajolet corrected this bias using an algorithm called LogLog. HLL uses a different type of averaging. Instead of the geometric mean used in LogLog, Flajolet et al. proposed using the harmonic mean. https://engineering.fb.com/2018/12/13/data-infrastructure/hy...