top | item 39639490

(no title)

So I've got a dumb question here: what happens when you use vanilla XmR charts with J-curve shaped or sub-exponential distributions?

My current simplistic (and very dumb!) solution that I've used for power-law type distributions — like HN virality, for instance — is to count the number of days between viral events, and then subject that to process control.[1] I basically take Wheeler's approach to chunky data and use that for J-curve type data, which tells me if the behaviour of my 'HN virality process' has changed.

I'd be very interested to learn of other approaches.

[1] HN traffic for commoncog.com displays routine variation most weeks with an Upper Process Limit of 192 and a Lower Process Limit of 0, unless one of my articles hit the front page, at which point I get 11-16k additional uniques).

discuss

kqr|2 years ago

I have an upcoming article on my lack of understanding on how to do this also. It's not finished but you may enjoy a near-finished draft. https://two-wrongs.com/extreme-value-spc

I did forget to bring up the Poisson approximation you mention though. I'll include that too.

jacques_chester|2 years ago

The example of performance is interesting because as you say, there are often multiple jostling distributions under the surface (GC is one, but another doozy is CPU frequency scaling).

One possible way out is to look for measurements that contribute to running time but which are not affected by other factors. I remember the YJIT folks talking about using CPU instruction counters, but I can't find it on the benchmark website.

jacques_chester|2 years ago

Time between events is an approach Montgomery (8th EMEA) discusses in 7.3.5. The application there is for dealing with very low error/defect rates. I am not familiar with Wheeler's approach.