top | item 40796887

(no title)

zeehio | 1 year ago

Using intervals for measurements has some limitations. But for many use cases we do not need more than intervals, so it's nice to have convenient tools for them. Intervals are a convenient model.

That's because measurements are complicated.

You use a ruler (or some other instrument) to measure something and get a value x.

You are happy.

Then for some reason you decide to repeat the measurement and you get a slightly different value. And problems start.

You decide to write down all the values you get. You are happy again.

Shortly after, you realise you have to use those values in calculations and you just want "one representative value", so you take the average or "the most common value" or some other aggregation, use your intuition!

Things start to go wrong when you have to take a decision by setting some threshold like "I do this if my value is above a threshold". Because the actual value may be different from your averaged number.

So you take the standard deviation and call it the uncertainty x±s.

But one day you realise that your measurements are not symmetric. You start by saying "instead of x±s, I will use different upper and lower bounds to define an interval".

For instance some things are measured on a log scale and you have a measure like 100±"one order of magnitude" which is "100, but may be between 10 and 1000".

Then you add a confidence, because you are not 100% certain you actually are in that range. Your measurement becomes "with 95% confidence I can say the measure is in [10,1000], with an expected value of 100".

Then you want to combine and aggregate those intervals and you realise they within the intervals their regions are not uniform, you actually have a probability distribution.

In the simple case is a Gaussian distribution, described with mean and variance. It can also be a binomial (a "p out of n cases" scenario). Or a lognormal like un our 10-1000 example.

And now for each measure you take you need to understand what probability distribution it follows and estimate its parameters.

And that parameter estimation is a measure, so it has confidence intervals as well.

At this point adding two measurements becomes not so easy anymore... But don't panic!

The nice part about all of this is that usually you don't care about precise error estimates, because you can live with bounding errors covering a worst case scenario.

And you can use the Central Limit Theorem (sometimes it is abused rather than used) to simplify calculations.

It is a rabbit hole and you need to know how deep you want to dig. Intervals are usually convenient enough.

discuss

No comments yet.