Count–Min Sketch requires each input to pass through multiple independent hash functions, which is computationally expensive. The workaround is to use a single hash function but use part of its output to split values into one of many buckets hence, simulating a situation in which we had m different hash functions. This costs nothing in terms of accuracy but saves computing many independent hash functions. This procedure is called stochastic averaging and has a predictable bias towards larger estimates. Durand-Flajolet corrected this bias using an algorithm called LogLog. HLL uses a different type of averaging. Instead of the geometric mean used in LogLog, Flajolet et al. proposed using the harmonic mean.
https://engineering.fb.com/2018/12/13/data-infrastructure/hy...
No comments yet.