Why do so many things follow the Normal Distribution?
I got this question on Quora, and thought I would give it a
go. It is an interesting question, and
one that is rarely addressed in any great depth in introductory stats
courses. After that, the fact of the normal
distribution becomes so “normal” that it rarely is addressed in more advanced
courses either. I will try to keep my
reasoning on a fairly intuitive level, so a general audience can get some
benefit from it (I hope).
First off, the average person might well ask “what is a normal
distribution”? It is that mathematical
creature that you might have also heard of as a “bell curve” or perhaps a Gaussian
distribution. The former term relates
to its visual appearance (as seen below), while the latter relates to one of its
most famous discoverers, the brilliant mathematician Gauss.
This picture represents something called the standard normal
distribution. The x-axis represents different values of
some real data distribution, and the y-values represent the probability of a
variable actually taking on that value. All
data that is normally distributed can be converted to this standard normal
distribution, and therefore can be analysed by one set of principles, that
apply to this picture.
In the middle is the mean value, what people generally refer
to as the average. To the left and right
are values that deviate from this average, expressed in a form called standard
deviations. As the picture shows, in a
normally distributed set of data, about two thirds of all values will fall
within one standard deviation of the mean, with 95% falling within two standard
deviations, and 99% falling within three standard deviations. So, normally distributed datasets have some
very predictable properties, which is nice.
They also have the nice property of being symmetrical around the mean,
as the picture shows. These two
properties are invaluable to statisticians and data scientists, in terms of
using mathematics and algorithms to predict many features of the real world.
The other important thing about the normal distribution, is
that many, many situations in the real world can be modelled by a normal
distribution, or at least come very close to a normal distribution. In fact, it tends to be the “go-to” distribution,
for most purposes. Some examples are the heights of a random population of
people, an IQ distribution or the pattern of misses that a shooter makes around
a bullseye.
Getting back to the original question, why is it that so
many real-world data distribution take this form? The usual explanation is given by another
name for the normal distribution, which is the “error distribution”. The idea is that errors are generally random,
so that they are as likely to go in one direction as in the other. For example, the marksman is as likely to
shoot a bit to the left, as a bit to the right, or a bit high as a bit
low. Thus, a graph of how far the shots
are from the bullseye will reflect this random tendency, and be symmetrical around
the mean. Similarly with height and
intelligence – many genes (perhaps thousands) contribute to these outcomes, as
do a great number of environmental factors, such as nutrition, illnesses, low
income and so forth.
As for the “bell shape” of the curve, that seems to relate to
some other facts about probability, the Bernoulli process and the Central Limit
Theorem. A Bernoulli process is a
process that has a set probability of success or failure, like tossing a
coin. The Central Limit says that if you
take a large number of samples from any distribution, and analyse some
statistic from that group of samples, you will eventually get a normal
distribution for that distribution. I
put those two facts together, in the experiment below.
In this experiment, I tossed a coin sixteen times, and counted
the number of heads. As I increased the
number of trials, that distribution became closer and closer to a normal
distribution. I simulated this in an
Excel spreadsheet, with the results shown below:
You can see how the graph becomes more and more like the
classic “bell shaped curve” as the number of simulated trials goes from 40 to
4000. Just how many trials are needed to
get “close enough” to a normal distribution is somewhat debatable, but for many
statistical purposes, it’s probably “normal enough” at about 100 trials, as
many statistical and/or data science methods are fairly robust, in this regard.
Here’s a quote from a book I own called “The Pleasures of
Probability”, by Richard Isaac:
“The Central Limit Theorem is sometimes
used to give a theoretical explanation for the frequency with which normal or
approximately normal distributions describe natural phenomena. It is said that the height of and adult, for
example, is due to a multitude of causes: genetic makeup, diet, environmental factors,
etc.. These factors often combine in an
approximately additive way, so that the result is, by the Central Limit
Theorem, close to normally distributed.
It is true that all these factors
contributing to an individual’s height
do not in general have the same distribution, nor are they always independent,
so the version of the Central Limit Theorem discussed here may not apply. There are, however, generalizations of the
Central Limit Theorem valid when there are departures from the identically
distributed assumption, and even from the independence assumption. Such results could offer a reasonable
explanation of why many phenomena are approximately normally distributed.”
(page 138)
It is worth noting that there are many other statistical
distributions that show up in real data.
One of the most important of these is the power law, which describes
many natural (e.g. the size distribution of craters on the moon) and social (e.g.
book or movie sales, number of followers on social media, goal scorers in the NHL, etc.) data distributions.
It’s important to recognize when normal distribution
assumptions are valid. The author of the
popular economics book “The Black Swan” goes into this in some detail, but that’s
another story (basically, unexpected things happen a lot more often than we
expect from our assumptions of normality, and when they do, they can have very drastic
consequences, like stock market crashes).
Now that you have read about “assumptions of normality”, why
not try some short stories where the usual laws of probability don’t seem to
apply:
A Dark Horse
Just what might a gambler give up, to go on the winning
streak of his life? And when is beating the odds too good to be true? Even he can't know for sure. Christopher
Marlowe's Doctor Faustus legend is given a Damon Runyon spin, in this short
story.
The Magnetic Anomaly: A Science Fiction Story
“A geophysical crew went into the Canadian north. There were
some regrettable accidents among a few ex-military who had become geophysical
contractors after their service in the forces. A young man and young woman went
temporarily mad from the stress of seeing that. They imagined things, terrible
things. But both are known to have vivid imaginations; we have childhood records
to verify that. It was all very sad. That’s the official story.”
No comments:
Post a Comment