Why do so many things follow the Normal Distribution?

I got this question on Quora, and thought I would give it a go. It is an interesting question, and one that is rarely addressed in any great depth in introductory stats courses. After that, the fact of the normal distribution becomes so “normal” that it rarely is addressed in more advanced courses either. I will try to keep my reasoning on a fairly intuitive level, so a general audience can get some benefit from it (I hope).

First off, the average person might well ask “what is a normal distribution”? It is that mathematical creature that you might have also heard of as a “bell curve” or perhaps a Gaussian distribution. The former term relates to its visual appearance (as seen below), while the latter relates to one of its most famous discoverers, the brilliant mathematician Gauss.

This picture represents something called the standard normal distribution. The x-axis represents different values of some real data distribution, and the y-values represent the probability of a variable actually taking on that value. All data that is normally distributed can be converted to this standard normal distribution, and therefore can be analysed by one set of principles, that apply to this picture.

In the middle is the mean value, what people generally refer to as the average. To the left and right are values that deviate from this average, expressed in a form called standard deviations. As the picture shows, in a normally distributed set of data, about two thirds of all values will fall within one standard deviation of the mean, with 95% falling within two standard deviations, and 99% falling within three standard deviations. So, normally distributed datasets have some very predictable properties, which is nice. They also have the nice property of being symmetrical around the mean, as the picture shows. These two properties are invaluable to statisticians and data scientists, in terms of using mathematics and algorithms to predict many features of the real world.

The other important thing about the normal distribution, is that many, many situations in the real world can be modelled by a normal distribution, or at least come very close to a normal distribution. In fact, it tends to be the “go-to” distribution, for most purposes. Some examples are the heights of a random population of people, an IQ distribution or the pattern of misses that a shooter makes around a bullseye.

Getting back to the original question, why is it that so many real-world data distribution take this form? The usual explanation is given by another name for the normal distribution, which is the “error distribution”. The idea is that errors are generally random, so that they are as likely to go in one direction as in the other. For example, the marksman is as likely to shoot a bit to the left, as a bit to the right, or a bit high as a bit low. Thus, a graph of how far the shots are from the bullseye will reflect this random tendency, and be symmetrical around the mean. Similarly with height and intelligence – many genes (perhaps thousands) contribute to these outcomes, as do a great number of environmental factors, such as nutrition, illnesses, low income and so forth.

As for the “bell shape” of the curve, that seems to relate to some other facts about probability, the Bernoulli process and the Central Limit Theorem. A Bernoulli process is a process that has a set probability of success or failure, like tossing a coin. The Central Limit says that if you take a large number of samples from any distribution, and analyse some statistic from that group of samples, you will eventually get a normal distribution for that distribution. I put those two facts together, in the experiment below.

In this experiment, I tossed a coin sixteen times, and counted the number of heads. As I increased the number of trials, that distribution became closer and closer to a normal distribution. I simulated this in an Excel spreadsheet, with the results shown below:

You can see how the graph becomes more and more like the classic “bell shaped curve” as the number of simulated trials goes from 40 to 4000. Just how many trials are needed to get “close enough” to a normal distribution is somewhat debatable, but for many statistical purposes, it’s probably “normal enough” at about 100 trials, as many statistical and/or data science methods are fairly robust, in this regard.

Here’s a quote from a book I own called “The Pleasures of Probability”, by Richard Isaac:

“The Central Limit Theorem is sometimes used to give a theoretical explanation for the frequency with which normal or approximately normal distributions describe natural phenomena. It is said that the height of and adult, for example, is due to a multitude of causes: genetic makeup, diet, environmental factors, etc.. These factors often combine in an approximately additive way, so that the result is, by the Central Limit Theorem, close to normally distributed. It is true that all these factors contributing to an individual’s height do not in general have the same distribution, nor are they always independent, so the version of the Central Limit Theorem discussed here may not apply. There are, however, generalizations of the Central Limit Theorem valid when there are departures from the identically distributed assumption, and even from the independence assumption. Such results could offer a reasonable explanation of why many phenomena are approximately normally distributed.” (page 138)

It is worth noting that there are many other statistical distributions that show up in real data. One of the most important of these is the power law, which describes many natural (e.g. the size distribution of craters on the moon) and social (e.g. book or movie sales, number of followers on social media, goal scorers in the NHL, etc.) data distributions.

It’s important to recognize when normal distribution assumptions are valid. The author of the popular economics book “The Black Swan” goes into this in some detail, but that’s another story (basically, unexpected things happen a lot more often than we expect from our assumptions of normality, and when they do, they can have very drastic consequences, like stock market crashes).

It is also important to know the difference between a normal distribution and a ghost:

Now that you have read about “assumptions of normality”, why not try some short stories where the usual laws of probability don’t seem to apply:

A Dark Horse

Just what might a gambler give up, to go on the winning streak of his life? And when is beating the odds too good to be true? Even he can't know for sure. Christopher Marlowe's Doctor Faustus legend is given a Damon Runyon spin, in this short story.

https://www.amazon.com/dp/B01M9BS3Y5

https://www.amazon.co.uk/dp/B01M9BS3Y5

The Magnetic Anomaly: A Science Fiction Story

“A geophysical crew went into the Canadian north. There were some regrettable accidents among a few ex-military who had become geophysical contractors after their service in the forces. A young man and young woman went temporarily mad from the stress of seeing that. They imagined things, terrible things. But both are known to have vivid imaginations; we have childhood records to verify that. It was all very sad. That’s the official story.”

https://www.amazon.com/dp/B0176H22B4

https://www.amazon.co.uk/dp/B0176H22B4

Dodecahedron Books

Monday, 11 February 2019

Why do so many things follow the Normal Distribution?

Why do so many things follow the Normal Distribution?

A Dark Horse

The Magnetic Anomaly: A Science Fiction Story

No comments:

Post a Comment