top | item 14816130

Why isn't everything normally distributed?

250 points| tambourine_man | 8 years ago |johndcook.com | reply

135 comments

order
[+] antognini|8 years ago|reply
As a (former) astronomer, I've never understood why people assume normal distributions for everything. I understand that there are good theoretical motivations for this --- namely, the normal distribution is the distribution that maximizes entropy for a given mean and variance. But in astronomy, nothing is normally distributed. (At least, nothing comes to mind.) Instead, everything is a power law. The reason for this is that most astrophysical processes are scale free over many orders of magnitude, and if you want a scale free process, it must be distributed as a power law.

There's actually a joke in the field that when you get a new dataset, the first thing you do is fit it to a power law. If that doesn't work, you fit it to a broken power law.

[+] blevin|8 years ago|reply
I work in CG animation, and one thing artists do when procedurally generating geometry is tweak the size distribution of things meant to mimic natural phenomena, such as leaves and rocks. They are typically assembling these as expressions, so it's common to first reach for a simple uniform distribution in the form of scale*rand(), but more experienced folks know to go straight to a power law instead. I think it's fascinating that such an extremely high-level way to characterize natural processes could make it into an artist's toolkit like that.
[+] enriquto|8 years ago|reply
There's this funny "heuristic" converse of the central limit theorem (by Mandelbrot, I think):

The ONLY variables that are normally distributed are those that are averages of many independent, identically distributed variables of finite variance.

Thus, if you cannot find the finite-variance variables that average up to form a variable X, then X is not normally distributed.

(The parts about independent and identically distributed are technical red-herrings. The only essential condition is finite variance.)

[+] cyphar|8 years ago|reply
Not to mention that in the wider world, the most common types of processes follow a Zipfian distribution[1]. It think this misunderstanding comes from undergraduate statistics courses teaching you all about normal and uniform distributions, while most students are unlikely to learn the limits and general inapplicability of that model unless they go into research. Most people's first encounter with real statistics is bell curves. Not to mention that the central limit theorem lulls you into a false sense of security that "at the end of the day" everything will look normal.

[1]: https://en.m.wikipedia.org/wiki/Zipf%27s_law

[+] lr4444lr|8 years ago|reply
I think it has a lot to do with people in the social sciences wishing that the CL Theorem could simplify everything sampled to a single mathematical model they can have tools programmed to do magically when fed the data.
[+] wenc|8 years ago|reply
There's also joke that goes:

Power law walks into a bar. Bartender says, "I've seen a hundred power laws. Nobody orders anything." Power law says, "1000 beers, please".

[+] monochromatic|8 years ago|reply
> if you want a scale free process, it must be distributed as a power law.

Could you elaborate a little, or maybe give an example?

[+] random3|8 years ago|reply
Richard McElreath has some nice points about variable independence assumptions. Concretely, for a model this assumption represents the state of "ignorance" and is the most conservative choice (i.e. doesn't assume any correlation). That said, he also mentions the "mind projection fallacy" of confusing epistemological claims with ontological claims, which may in turn be the reason why many assume normal distributions for everything.
[+] justifier|8 years ago|reply
reminds me of this joke from a lecture Lawrence Krauss gave in 2009 of his work in 'A Universe From Nothing':

> it was made after the discovery that on a log log plot everything is a straight line(o)

i'd recommend the entire lecture, but definitely at least check out the anecdote this joke bookends

it is about hubble's original 1929 embarrassing failed attempt to calculate the rate of the expansion of the universe(i)

(o) https://www.youtube.com/watch?v=7ImvlS8PLIo&t=18m25s

(i) https://www.youtube.com/watch?v=7ImvlS8PLIo&t=12m55s

[+] srean|8 years ago|reply
> I understand that there are good theoretical motivations for this --- namely, the normal distribution is the distribution that maximizes entropy for a given mean and variance.

That is indeed true, but why should such a property imply that the use of Normal distribution is appropriate ? Just a rhetorical question, of course, because your comment does indicate that Normal is not a good choice unless one has compelling reasons to do so.

Another argument that is used to justify its use is the central limit theorem. That says a sum of (nearly) independent variables with (nearly) identical distributions with finite mean converge to the Normal distribution. If the process under observation is indeed a superposition of such random processes, then yes the choice of the Gaussian can be justified. But it is surprisingly common that one of the 3 requirements are violated. A common violation is that the variance is infinite or it is so high that the process is better modeled as one with an infinite variance. In such situations the family of stable distributions are the more appropriate choice.

Gauss' own use of the Gaussian was motivated by convenience rather than any deep theory. Even at that time it was well known that other distributions, for example, Laplace distribution works better.

[+] BrandoElFollito|8 years ago|reply
I think that "good theoretical motivations" is the key point.

I am a (also former) physicist and it drives me crazy when people fit what ever happens to be in the chart to a straight line, "to get the trend". Whenever I ask them for the theory which predicts that x and y will be linked by y = ax + b, they do not have any.

This also goes on with extrapolations or interpolations, usually without the slightest theoretical reason to do so.

The problem is that when working with marketing, HR or even finance, they are so used to "getting the trend" with a linear fit that I am hopeless.

I once draw a parabole and asked for the trend (between the minimum). They were surprised by this stupid question as "it is obvious that there is no trend". Taking some random points linking share value with the number of women in a company "obviously fits to a trend line".

[+] rjbwork|8 years ago|reply
My boss and I talk about this phenomenon a lot. Normal distributions have lots of nice properties, and lots of tools that can do lots of fancy things, and infer lots of nice "facts" from them.

So there is a huge bias among researchers to assume them to make their treatment of the data easier.

[+] sevensor|8 years ago|reply
> There's actually a joke in the field that when you get a new dataset, the first thing you do is fit it to a power law. If that doesn't work, you fit it to a broken power law.

That's funny -- my field has the same thing with a different distribution. Analyzing stochastic processes is much easier if you assume an exponential distribution. It's one of my criteria for whether somebody's giving a bad job talk. If an unfounded assumption that waiting times are exponentially distributed shows up in the first three slides, the rest of the presentation is probably B.S. Even if the presenter used Beamer and filled the slides with beautiful equations. (The exponential distribution of waiting times is basically an assumption that events are independent.)

Edit: Especially if the presenter's slides are full of beautiful equations.

[+] smueller1234|8 years ago|reply
That's almost not a joke - I did particle astrophysics, not astronomy, but for the exact same reason you state, our physics results (we used particle physics as a window to astronomy, or at least we tried to) would end up with broken power law fits.

There's something deeply satisfying using a model so simple to explain a piece of nature. :)

Example: cosmic ray energy spectrum. http://iopscience.iop.org/1367-2630/12/7/075009/downloadFigu...

[+] theoh|8 years ago|reply
When you want to fit a distribution to a star's image on the plate, normal does indeed seem a weird choice. But the point about multidimensional gaussians is that they are the only distribution that is maximally "ignorant" (maximum entropy) -- so if there's a small (unresolvable) companion star we don't know about, we can do no better than a gaussian if we want to ignore it.
[+] logicallee|8 years ago|reply
that's really weird. I'm not an astronomer so I thought "hmmm, what's the first thing I can think of in astronomy."

Well, how about the magnitude of stars. So, is the magnitude of stars normally distributed? I googled "distribution magnitude of stars" and got images (by clicking on images tab) that seem pretty normally distributed to me. Aren't they?

[+] DINKDINK|8 years ago|reply
I think it's about size and independence. Gaussian distributions connote small scale (the background noise of the electrical output of this sensor) vs power laws connote large scale (the number of electrical sensors per company) that you normally deal with in your field.

The size of a ruler that a machine cuts (quasi normal/Gaussian)

Wealth (Power/Pareto)

[+] cttet|8 years ago|reply
Power law indicates an exponential distribution, which is often taught right after normal distribution in most probability classes. Normal distribution usually considered as white noise, hence a fixed value + white noise is often Normal, which is quite common as well.
[+] Kenji|8 years ago|reply
> As a (former) astronomer, I've never understood why people assume normal distributions for everything.

It shouldn't, because there is a big reason to lean towards normal distribution, and that is the central limit theorem. If you sample a variety of different distributions, the sum of all these samples will go towards a normal distribution. That is why the normal distribution pops up in so many places.

[+] technofire|8 years ago|reply
Perhaps a better question is "Why is anything normally distributed?" It appears originally to have been a simplification to make the math more convenient:

As Rand Wilcox reports, "Why did Gauss assume that a plot of many observations would be symmetric around some point? Again, the answer does not stem from any empirical argument, but rather a convenient assumption that was in vogue at the time. This assumption can be traced back to the first half of the 18th century and is due to Thomas Simpson. Circa 1755, Thomas Bayes argued that there is no particular reason for assuming symmetry, Simpson recognized and acknowledged the merit of Bayes's argument, but it was unclear how to make any mathematical progress if asymmetry is allowed." (Wilcox, p. 4)

Wilcox, R. (2010). Fundamentals of modern statistical methods: Substantially improving power and accuracy (2nd ed.). New York, New York: Springer.[1]

[1] http://amzn.to/2tkMRoI

[+] abetusk|8 years ago|reply
As others have pointed out, power laws are more "normal" than the normal distribution.

The reason for this is that if you have the sum independent identically distributed (I.I.D.) random variables (R.V.s), if they converge to a distribution, that distribution is Levy Stable [1], which is power law in it's tails. The Gaussian is a special case in the family of Levy Stable distributions.

The article states that "the sum of many independent, additive effects is approximately normally distributed" which is patently false. The sum of many independent random variables with finite variance is normally distributed. Once you relax the finite variance (and in more extreme cases, finite mean) power laws result.

There are other ways to generate power laws, including having killed exponential processes [2]. There are many other references that talk about the rediscovery of power laws [3] and give many ways to "naturally" create power laws [3] [4] [5].

The article claims that multiplicative processes lead to log normal distributions. I've heard that this is actually false but unfortunately I don't have enough familiarity to see how this is not true. If anyone has more insight into this I would appreciate a link to an article or other explanation.

[1] https://en.wikipedia.org/wiki/Stable_distribution

[2] http://www.angelfire.com/nv/telka/transfer/powerlaw_expl.pdf

[3] https://arxiv.org/pdf/physics/0601192v3.pdf

[4] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122...

[5] http://www.angelfire.com/nv/telka/transfer/powerlaw_expl.pdf

[+] leephillips|8 years ago|reply
An example from a book I used to own, called, I think, _Treatment of Experimental Data_: imagine a factory manufacturing ball bearings. They want them all to have the same radius, but because of random errors, the radii will be normally distributed about some mean. If this is true, other random variables, such as the mass, can not be normally distributed.
[+] teraflop|8 years ago|reply
Technically true, but the smaller the standard deviation of the radius (relative to the mean), the more closely the distribution of masses will approximate a normal distribution.
[+] prashnts|8 years ago|reply
Sorry, could you explain why mass can't be normally distributed? If, say, the mass of the bearing is related to its radius, then shouldn't it follow similar distribution?
[+] pishpash|8 years ago|reply
Depends on whether the radius errors are generated from a separate process than the mass errors. (1) Get a glob of molten metal, mass determined by how much metal is there, some error there; (2) shape it into a ball, and suppose the radius is mainly determined by air that gets trapped, then some nearly independent error there.
[+] cjhanks|8 years ago|reply
That's kind of an odd metric to begin with. Radius would probably be derived from the imperfections in the consistency of matter. Whether that distribution is gaussian probably depends on the fabrication process and the materials...
[+] wodenokoto|8 years ago|reply
This kinda blows my mind.

What happens if they do quality control on both mass and radius?

I imagine the reason why, in your example it is the radius rather than the mass that is normal, is due to what QA is focused on.

[+] escherize|8 years ago|reply
Hmm, why is that?
[+] nerdponx|8 years ago|reply
From the comments:

Personally, I don’t find it surprising that not everything is normally distributed. Why should any real phenomenon follow a theoretical limiting distribution anyway, never mind a symmetric, infinite-tailed distribution that is exact only in an unachievable limit? The surprise is that so many things _are_ sufficiently near normality for it to be useful!

[+] graycat|8 years ago|reply
Because for random variable X, sometimes need to care about X^2, and when X is normally distributed X^2 is chi-square distributed and not normally distributed.

Because under mild assumptions, for arrivals, say, of visitors to a Web site, gas station, hospital, the arrivals form a Poisson process, the times between arrivals are independent and identically exponentially distributed, and that's not normally distributed.

Now at nearly any server farm, it is easy to get wide, deep, rapidly flowing oceans of data on the performance of the server farm, and, as U. Grenander once explained to me in his office at Brown, the data is wildly different from what statistics was used to, e.g., in medical data. In that ocean of data, finding anything normally distributed will be very rare.

The claims in the OP about many effects are nothing like good evidence for the central limit theorem or normally distributed. E.g., from the renewal theorem, many examples of Poisson processes are from the results of many independent effects.

E.g., the usual computer based random number generators return what look like independent, identically distributed random variables uniform on [0,1], and that is not normally distributed.

The question in the OP about why not normally distributed is, in one word, just absurd.

[+] eanzenberg|8 years ago|reply
What's more surprising is how little the central limit theorem holds and is useful, yet is still used all the time to justify poor analysis, usually a/b testing. When the underlying distribution has high variance, as many metrics I've come across with extreme long tailed behavior, the aggregates need large N before they adhere to Gaussian
[+] vanderZwan|8 years ago|reply
> Height is influenced by environmental effects as well as genetic effects, such as nutrition, and these environmental effects may be more additive or independent than genetic effects.

This makes me wonder if in countries where these environmental effects are mostly optimised (everyone having access to good nutrition), or at least nearly identical for everyone, the normal distribution of height breaks down.

Is height normally distributed in the tallest countries in the world? What about the shortest?

edit: I'll just copy this question to the comments under his blog, maybe the author has some idea about that.

edit2: just noticed the blog post is from 2015... oh well, it was worth a shot.

[+] SophosQ|8 years ago|reply
I came across this CMU presentation on why the purported ubiquity of power laws must be taken with a grain of salt:

https://goo.gl/23PP7v

Check from slide 32.

As someone who doesn't have significant experience in statistics, I'd be grateful for an expert's opinion on the arguments presented in this presentation.

[+] jtolmar|8 years ago|reply
I'd take the CLT as everything complicated being normally distributed by default, but with fairly common exceptions:

1 - If the problem isn't actually that complicated, the CLT doesn't do much.

2 - If the problem is dominated by one component, it will still mostly look like that component.

3 - Most ways of slicing a normal distribution lead to other distributions. For example the Rician distribution.

[+] evanwarfel|8 years ago|reply
Because not everything has finite variance.
[+] BjoernKW|8 years ago|reply
Because many phenomena aren't in fact representations of arbitrary random variables.

Take word distribution in any human language for instance. Word frequencies follow a Zipf distribution because it decreases entropy and hence is more efficient.

[+] acscott|8 years ago|reply
Now, having read some comments, do not get lost in your assumptions (which implies you should know your assumptions). It's really that simple.
[+] tictacttoe|8 years ago|reply
There are a lot of quantities which are positive definite. If it's bounded from below, it's not Gaussian.
[+] 725686|8 years ago|reply
Nassim Nichalas Taleb doesn't have many nice things about the bell curve in his Black Swan book.
[+] acscott|8 years ago|reply
From the question without even reading a thing, why would any set of events follow a gaussian?
[+] ruste|8 years ago|reply
Can anyone quickly explain to me why some things _are_ normally distributed?