Pi Day 2020 –Pi and the Normal Distribution
The Mathematical Theory of Pi and the Normal Distribution
Wigner
wrote a famous paper (“The Unreasonable Effectiveness of Mathematics”) about
how odd it is that mathematics and the real world have such a close coupling,
so to speak. An illustration of this is
outlined below, which is especially relevant on Pi Day.
One of
the interesting things about Pi is how it shows up in unexpected places, in the
mathematical and real world. And one of
the interesting things about the normal distribution (or Gaussian) is how often
it accurately describes real world phenomena, at least to a good approximation.
Pi is
actually embedded in the formula that generates the Normal Distribution, as is
the transcendental number e and the irrational number square root of 2. Quoting from the book “Understanding
Probability” (H. Tijms):
“One could say that the normal curve is a natural
law of sorts, and it is worth noting that each of the three famous mathematical
constants root 2, pi=3.141… and e=2.718… play roles in its makeup. Many natural phenomena, such as the height of
men, harvest yields, errors in physical measurements, luminosity of stars,
returns on stocks, etc., can be described by a normal curve.”
The
formula is shown below. Sigma is the
standard deviation of the distribution, mu is the mean.
If you
want some more explanation of why pi shows up in the formula for the normal
distribution, try this blog. He has a
nice explanation, if you know some calculus.
As a bit
of differential calculus shows, as well as some visual inspection, the peak
value for this symmetric function is at e to the 0th power (thus
equal to 1), which happens when x=mu, the mean of the distribution. This also results in the height of the peak
being related to pi:
Peak = 1/(sigma * square root (2 times pi))
If you
know the peak of a normal distribution, you can use this to estimate Pi from
data.
Another
relationship than can be deduced from all this is:
MAD = sigma * square root(2/pi)
MAD is
the average absolute deviation, which is the mean of the absolute values of the
differences between points in the distribution and the mean of the
distribution.
That’s
rather like the formula for variance, but uses absolute value rather than
squaring the difference. Either way, you avoid negative numbers. You don’t see MAD used all that much in
regular statistical practice, so it’s interesting to use it here.
Using Normally Distributed Data to Estimate Pi
Anyway,
the point of all this, is that you can use the numbers from a normally
distributed dataset to estimate pi. I
played around with that, giving the results below. I used the second relationship, involving MAD
for the purpose.
1. Using Simulations in Excel
First off, here are results from using
simulated data, based on the normal distribution. In Excel, you can use their Data Analysis
package or the formula
norminv(rand(),
mean, stdev)
to
generate the required numbers. Then you
can use the variance formula in Excel to calculate the variance of the
generated data, and the average formula to generate the mean, from which you can
calculate the deviation for each point, take the absolute value of that and
generate an average. Then Pi can be
estimated from:
PI = (2 *
(variance*variance))/(MAD*MAD)
This will only be precisely true for a
perfect, infinite normal distribution dataset, which is of course impossible in
the real world or even in a computer.
So, as noted, this is treated as an estimate.
Here are some estimates for Pi, using various
numbers of randomly generated, normally distributed data points:
Value of Pi,
estimated from Normal Distribution, various size of simulated dataset
|
||||||
Run
|
500 K
|
50 K
|
5K
|
.5 K
|
.1 K
|
|
1
|
3.1369
|
3.1498
|
3.0909
|
3.0478
|
3.2250
|
|
2
|
3.1365
|
3.1243
|
3.1248
|
3.5711
|
3.1367
|
|
3
|
3.1414
|
3.1160
|
3.1686
|
2.7884
|
3.6031
|
|
4
|
3.1374
|
3.1308
|
3.1689
|
3.4772
|
3.1697
|
|
5
|
3.1332
|
3.1353
|
3.1560
|
2.7704
|
2.6016
|
|
6
|
3.1370
|
3.1443
|
3.2166
|
3.2497
|
2.4286
|
|
7
|
3.1493
|
3.1419
|
3.3306
|
3.5507
|
3.1429
|
|
8
|
3.1473
|
3.1672
|
3.2095
|
3.4633
|
2.5636
|
|
9
|
3.1526
|
3.1558
|
3.1258
|
3.3079
|
3.4690
|
|
10
|
3.1390
|
3.1567
|
3.1593
|
3.5132
|
3.3688
|
|
11
|
3.1357
|
3.1565
|
1.7958
|
3.4977
|
3.4358
|
|
12
|
3.1513
|
3.1229
|
3.2390
|
3.3730
|
3.2837
|
|
13
|
3.1531
|
3.1332
|
3.1755
|
3.2773
|
3.4404
|
|
14
|
3.1393
|
3.1264
|
3.1467
|
3.2081
|
3.2094
|
|
15
|
3.1507
|
3.1422
|
3.1690
|
3.2656
|
1.9962
|
|
16
|
3.1380
|
3.1167
|
3.2285
|
3.1871
|
3.1321
|
|
17
|
3.1481
|
3.1533
|
3.0957
|
3.0320
|
3.3556
|
|
18
|
3.1387
|
3.1288
|
3.0366
|
3.0675
|
3.1471
|
|
19
|
3.1493
|
3.1156
|
3.1311
|
2.8541
|
2.9467
|
|
20
|
3.1286
|
3.1500
|
3.2534
|
3.0860
|
3.0628
|
|
21
|
3.1397
|
3.1321
|
3.2255
|
3.0501
|
3.1321
|
|
22
|
3.1353
|
3.1855
|
3.1319
|
3.0628
|
3.0040
|
|
23
|
3.1433
|
3.1674
|
3.1378
|
3.3351
|
2.9529
|
|
24
|
3.1448
|
3.1394
|
3.1329
|
2.9784
|
3.5530
|
|
25
|
3.1495
|
3.1160
|
3.1065
|
3.0981
|
2.1003
|
|
Est Pi
|
3.1422
|
3.1403
|
3.1103
|
3.2045
|
3.0584
|
|
Actual Pi
|
3.1416
|
3.1416
|
3.1416
|
3.1416
|
3.1416
|
|
Abs Diff
|
0.0007
|
0.0013
|
0.0313
|
0.0629
|
0.0831
|
|
Pct Diff
|
0.021%
|
0.040%
|
0.997%
|
2.002%
|
2.647%
|
|
N
|
500000
|
50000
|
5000
|
500
|
100
|
|
Sqrt N
|
707
|
224
|
71
|
22
|
7
|
The table shows that this isn’t a very
effective way to estimate Pi, but it does work in principle. As the number of points in the simulated data
set goes up, the accuracy of the estimate improves, from only being accurate to
within 1 decimal place (2.6 pct. error) with 100 points to being accurate to
within 3 decimal places (0.02 pct.) with half a million points in the simulated
dataset. That’s a pretty good
improvement, but not exactly what you might call a really rapid convergence to
the value of Pi.
The histograms above show the actual
distributions for each value of N, as well as the estimate for Pi derived from
that distribution. As you can see, even
with this amount of data points, the distributions still have some lack of
smoothness, so it is no surprise that the estimate for Pi is only good to about
3 decimal places, at best.
2. Using Real World Dataset
Here are results of estimating Pi, from
using some real world datasets, obtained from the internet.
Value of Pi,
Estimated from Real World Datasets, Approxmated by Normal Distributions
|
||||||
Exp
|
Data
|
Pi Est
|
Pi Actual
|
Abs Dff
|
Pct Diff
|
N
|
1
|
Fly Wings
|
3.07716
|
3.14159
|
0.06443
|
2.1%
|
100
|
2
|
Wheat
|
3.05782
|
3.14159
|
0.08377
|
2.7%
|
500
|
3
|
Hurricanes
|
3.41711
|
3.14159
|
0.27551
|
8.8%
|
755
|
4
|
Ozone
|
3.42920
|
3.14159
|
0.28760
|
9.2%
|
70
|
5
|
Precipitation
|
3.16298
|
3.14159
|
0.02139
|
0.7%
|
144
|
6
|
Earthquakes
|
3.18518
|
3.14159
|
0.04359
|
1.4%
|
100
|
7
|
Columbia R 5
|
2.92371
|
3.14159
|
0.21788
|
6.9%
|
66
|
8
|
Columbia R 55
|
3.53768
|
3.14159
|
0.39608
|
12.6%
|
66
|
9
|
Columbia R 95
|
3.17431
|
3.14159
|
0.03272
|
1.0%
|
66
|
Avg.
|
3.21835
|
3.14159
|
0.15811
|
5.0%
|
207
|
As you can see, the real world data, which
was modelled with a standard normal distribution, yielded rather poor estimates
of Pi, though recognizably in the ballpark.
But one has to keep in mind that the datasets are merely measurements of
real world statistics – intuitively, we wouldn’t expect them to have anything
at all to do with the value of Pi.
It is also good to keep in mind that these
datasets averaged only about 200 points, with the majority of them being <=
100 members. When we compare the results
for the 100 point datasets simulated above, we see that they aren’t so
bad. The simulated small datasets only
estimated Pi to within about 2.6%, while these real-world datasets came in at
about 5.0%. One wonders how a very large
real-word dataset might fare in such an exercise.
Looking at histograms of the real-world
datasets it is clear that they do depart from normality, some fairly substantially,
though they would probably be considered “close enough” for a lot of
statistical tests that assume a normally distributed dataset.
Anyway, I am kind of charmed by the idea
that you can get a fairly close estimate of Pi by measuring current speeds in
the Columbia River (within 1.0 pct of Pi) or the precipitation in Reading PA
over about 150 years (within 0.7 pct of Pi).
Related Blogs
And here
are a couple of other Pi Day posts that I have done, with similar Pi-related
experiments:
Calculating
Pi from Shooting Arrows
Calculating
Pi from Probability Theory (Buffon’s Needle): https://dodecahedronbooks.blogspot.ca/2017/03/pi-day-floor-pie-and-floor-pi.html
Calculating
Pi via Nested Polygons:
And as
always:
Sources:
Beckmann,
Petr. A History of Pi (p. 101). St. Martin's Press. Kindle Edition.
Henk
Tijms. Understanding Probability: Chance Rules in Everyday Life. Cambridge.
Real
World Data
So, now
that you have done some math, you should read a science fiction book, or even
better, a whole series. Book 1 of the
Witches’ Stones series even includes a reference to pi.:
Kati of Terra
How about trying Kati of Terra,
the 3-novel story of a feisty young Earth woman, making her way in that big,
bad, beautiful universe out there.
The Witches’ Stones
Or, you might
prefer, the trilogy of the Witches’ Stones (they’re psychic aliens, not actual
witches), which follows the interactions of a future Earth confederation, an
opposing galactic power, and the Witches of Kordea. It features Sarah Mackenzie, another feisty
young Earth woman (they’re the most interesting type – the novelist who wrote
the books is pretty feisty, too).
The Magnetic Anomaly: A Science Fiction Story
“A geophysical crew went into the Canadian
north. There were some regrettable accidents among a few ex-military who had
become geophysical contractors after their service in the forces. A young man
and young woman went temporarily mad from the stress of seeing that. They
imagined things, terrible things. But both are known to have vivid
imaginations; we have childhood records to verify that. It was all very sad.
That’s the official story.”
In the field known as Astrobiology, there
is a research program called SETI, The Search for Extraterrestrial
Intelligence. At the heart of SETI, there is a mystery known as The Great
Silence, or The Fermi Paradox, named after the famous physicist Enrico Fermi.
Essentially, he asked “If they exist, where are they?”.
Some quite cogent arguments maintain that if there was extraterrestrial intelligence, they should have visited the Earth by now. This story, a bit tongue in cheek, gives a fictional account of one explanation for The Great Silence, known as The Zoo Hypothesis. Are we a protected species, in a Cosmic Zoo? If so, how did this come about? Read on, for one possible solution to The Fermi Paradox.
The short story is about 6300 words, or about half an hour at typical reading speeds.
Some quite cogent arguments maintain that if there was extraterrestrial intelligence, they should have visited the Earth by now. This story, a bit tongue in cheek, gives a fictional account of one explanation for The Great Silence, known as The Zoo Hypothesis. Are we a protected species, in a Cosmic Zoo? If so, how did this come about? Read on, for one possible solution to The Fermi Paradox.
The short story is about 6300 words, or about half an hour at typical reading speeds.
No comments:
Post a Comment