We insert from our alerts worker because we want the aggregation to happen per alert (with the aggregated data filtered by the particular alert definition). As each alert is evaluated, we run the following [1] INSERT INTO ... SELECT ... statement based on the alert definition. We can't aggregate with an MV since we'd need to create an MV per unique alert that a customer may set up.
At a certain scale, exact computations (p50 for instance) become impractical. I’ve had great luck switching to approximate calculations with guaranteed error bounds.
The quantile() method in clickhouse is also approximate although it uses a more simplistic reservoir sampling model than GK, but quantileGK() is also available. quantileExact() exists but indeed becomes impractical as you point out.
hodgesrm|1 year ago
vadman97|1 year ago
[1]: https://github.com/highlight/highlight/blob/c526daea31fdf764...
nitinreddy88|1 year ago
iampims|1 year ago
An approachable paper on the topic is "Effective Computation of Biased Quantiles over Data Streams" http://dimacs.rutgers.edu/%7Egraham/pubs/papers/bquant-icde....
hipadev23|1 year ago