Saturday, 14 March 2020

Pi Day 2020 –Pi and the Normal Distribution


Pi Day 2020 –Pi and the Normal Distribution

The Mathematical Theory of Pi and the Normal Distribution

Wigner wrote a famous paper (“The Unreasonable Effectiveness of Mathematics”) about how odd it is that mathematics and the real world have such a close coupling, so to speak.  An illustration of this is outlined below, which is especially relevant on Pi Day.  

One of the interesting things about Pi is how it shows up in unexpected places, in the mathematical and real world.  And one of the interesting things about the normal distribution (or Gaussian) is how often it accurately describes real world phenomena, at least to a good approximation.

Pi is actually embedded in the formula that generates the Normal Distribution, as is the transcendental number e and the irrational number square root of 2.  Quoting from the book “Understanding Probability” (H. Tijms):

“One could say that the normal curve is a natural law of sorts, and it is worth noting that each of the three famous mathematical constants root 2, pi=3.141… and e=2.718… play roles in its makeup.  Many natural phenomena, such as the height of men, harvest yields, errors in physical measurements, luminosity of stars, returns on stocks, etc., can be described by a normal curve.”

The formula is shown below.  Sigma is the standard deviation of the distribution, mu is the mean.



If you want some more explanation of why pi shows up in the formula for the normal distribution, try this blog.  He has a nice explanation, if you know some calculus.

As a bit of differential calculus shows, as well as some visual inspection, the peak value for this symmetric function is at e to the 0th power (thus equal to 1), which happens when x=mu, the mean of the distribution.  This also results in the height of the peak being related to pi:

Peak = 1/(sigma * square root (2 times pi))

If you know the peak of a normal distribution, you can use this to estimate Pi from data.
Another relationship than can be deduced from all this is:

MAD = sigma * square root(2/pi)

MAD is the average absolute deviation, which is the mean of the absolute values of the differences between points in the distribution and the mean of the distribution.



That’s rather like the formula for variance, but uses absolute value rather than squaring the difference. Either way, you avoid negative numbers.  You don’t see MAD used all that much in regular statistical practice, so it’s interesting to use it here.

 

Using Normally Distributed Data to Estimate Pi

Anyway, the point of all this, is that you can use the numbers from a normally distributed dataset to estimate pi.  I played around with that, giving the results below.  I used the second relationship, involving MAD for the purpose.

 

1. Using Simulations in Excel

First off, here are results from using simulated data, based on the normal distribution.  In Excel, you can use their Data Analysis package or the formula

norminv(rand(), mean, stdev)

 to generate the required numbers.  Then you can use the variance formula in Excel to calculate the variance of the generated data, and the average formula to generate the mean, from which you can calculate the deviation for each point, take the absolute value of that and generate an average.  Then Pi can be estimated from:

PI = (2 * (variance*variance))/(MAD*MAD)

This will only be precisely true for a perfect, infinite normal distribution dataset, which is of course impossible in the real world or even in a computer.  So, as noted, this is treated as an estimate.

Here are some estimates for Pi, using various numbers of randomly generated, normally distributed data points:

Value of Pi, estimated from Normal Distribution, various size of simulated dataset
Run
500 K
50 K
5K
.5 K
.1 K

1
3.1369
3.1498
3.0909
3.0478
3.2250

2
3.1365
3.1243
3.1248
3.5711
3.1367

3
3.1414
3.1160
3.1686
2.7884
3.6031

4
3.1374
3.1308
3.1689
3.4772
3.1697

5
3.1332
3.1353
3.1560
2.7704
2.6016

6
3.1370
3.1443
3.2166
3.2497
2.4286

7
3.1493
3.1419
3.3306
3.5507
3.1429

8
3.1473
3.1672
3.2095
3.4633
2.5636

9
3.1526
3.1558
3.1258
3.3079
3.4690

10
3.1390
3.1567
3.1593
3.5132
3.3688

11
3.1357
3.1565
1.7958
3.4977
3.4358

12
3.1513
3.1229
3.2390
3.3730
3.2837

13
3.1531
3.1332
3.1755
3.2773
3.4404

14
3.1393
3.1264
3.1467
3.2081
3.2094

15
3.1507
3.1422
3.1690
3.2656
1.9962

16
3.1380
3.1167
3.2285
3.1871
3.1321

17
3.1481
3.1533
3.0957
3.0320
3.3556

18
3.1387
3.1288
3.0366
3.0675
3.1471

19
3.1493
3.1156
3.1311
2.8541
2.9467

20
3.1286
3.1500
3.2534
3.0860
3.0628

21
3.1397
3.1321
3.2255
3.0501
3.1321

22
3.1353
3.1855
3.1319
3.0628
3.0040

23
3.1433
3.1674
3.1378
3.3351
2.9529

24
3.1448
3.1394
3.1329
2.9784
3.5530

25
3.1495
3.1160
3.1065
3.0981
2.1003

Est Pi
3.1422
3.1403
3.1103
3.2045
3.0584

Actual Pi
3.1416
3.1416
3.1416
3.1416
3.1416

Abs Diff
0.0007
0.0013
0.0313
0.0629
0.0831

Pct Diff
0.021%
0.040%
0.997%
2.002%
2.647%

N
500000
50000
5000
500
100

Sqrt N
707
224
71
22
7


The table shows that this isn’t a very effective way to estimate Pi, but it does work in principle.  As the number of points in the simulated data set goes up, the accuracy of the estimate improves, from only being accurate to within 1 decimal place (2.6 pct. error) with 100 points to being accurate to within 3 decimal places (0.02 pct.) with half a million points in the simulated dataset.  That’s a pretty good improvement, but not exactly what you might call a really rapid convergence to the value of Pi.



The histograms above show the actual distributions for each value of N, as well as the estimate for Pi derived from that distribution.  As you can see, even with this amount of data points, the distributions still have some lack of smoothness, so it is no surprise that the estimate for Pi is only good to about 3 decimal places, at best.



2. Using Real World Dataset

Here are results of estimating Pi, from using some real world datasets, obtained from the internet.

Value of Pi, Estimated from Real World Datasets, Approxmated by Normal Distributions
Exp
Data
Pi Est
Pi Actual
Abs Dff
Pct Diff
N
1
Fly Wings
3.07716
3.14159
0.06443
2.1%
100
2
Wheat
3.05782
3.14159
0.08377
2.7%
500
3
Hurricanes
3.41711
3.14159
0.27551
8.8%
755
4
Ozone
3.42920
3.14159
0.28760
9.2%
70
5
Precipitation
3.16298
3.14159
0.02139
0.7%
144
6
Earthquakes
3.18518
3.14159
0.04359
1.4%
100
7
Columbia R 5
2.92371
3.14159
0.21788
6.9%
66
8
Columbia R 55
3.53768
3.14159
0.39608
12.6%
66
9
Columbia R 95
3.17431
3.14159
0.03272
1.0%
66

Avg.
3.21835
3.14159
0.15811
5.0%
207

As you can see, the real world data, which was modelled with a standard normal distribution, yielded rather poor estimates of Pi, though recognizably in the ballpark.  But one has to keep in mind that the datasets are merely measurements of real world statistics – intuitively, we wouldn’t expect them to have anything at all to do with the value of Pi.

It is also good to keep in mind that these datasets averaged only about 200 points, with the majority of them being <= 100 members.  When we compare the results for the 100 point datasets simulated above, we see that they aren’t so bad.  The simulated small datasets only estimated Pi to within about 2.6%, while these real-world datasets came in at about 5.0%.  One wonders how a very large real-word dataset might fare in such an exercise.

Looking at histograms of the real-world datasets it is clear that they do depart from normality, some fairly substantially, though they would probably be considered “close enough” for a lot of statistical tests that assume a normally distributed dataset.

Anyway, I am kind of charmed by the idea that you can get a fairly close estimate of Pi by measuring current speeds in the Columbia River (within 1.0 pct of Pi) or the precipitation in Reading PA over about 150 years (within 0.7 pct of Pi).







 

Related Blogs

And here are a couple of other Pi Day posts that I have done, with similar Pi-related experiments:
Calculating Pi from Shooting Arrows



Calculating Pi from Probability Theory (Buffon’s Needle): https://dodecahedronbooks.blogspot.ca/2017/03/pi-day-floor-pie-and-floor-pi.html



Calculating Pi via Nested Polygons:



And as always:






Sources:
Beckmann, Petr. A History of Pi (p. 101). St. Martin's Press. Kindle Edition.
Henk Tijms. Understanding Probability: Chance Rules in Everyday Life.  Cambridge.
Real World Data



So, now that you have done some math, you should read a science fiction book, or even better, a whole series.  Book 1 of the Witches’ Stones series even includes a reference to pi.:

Kati of Terra

How about trying Kati of Terra, the 3-novel story of a feisty young Earth woman, making her way in that big, bad, beautiful universe out there. 




The Witches’ Stones

Or, you might prefer, the trilogy of the Witches’ Stones (they’re psychic aliens, not actual witches), which follows the interactions of a future Earth confederation, an opposing galactic power, and the Witches of Kordea.  It features Sarah Mackenzie, another feisty young Earth woman (they’re the most interesting type – the novelist who wrote the books is pretty feisty, too).






The Magnetic Anomaly: A Science Fiction Story

“A geophysical crew went into the Canadian north. There were some regrettable accidents among a few ex-military who had become geophysical contractors after their service in the forces. A young man and young woman went temporarily mad from the stress of seeing that. They imagined things, terrible things. But both are known to have vivid imaginations; we have childhood records to verify that. It was all very sad. That’s the official story.” 






The Zoo Hypothesis or The News of the World: A Science Fiction Story
 
In the field known as Astrobiology, there is a research program called SETI, The Search for Extraterrestrial Intelligence. At the heart of SETI, there is a mystery known as The Great Silence, or The Fermi Paradox, named after the famous physicist Enrico Fermi. Essentially, he asked “If they exist, where are they?”.

Some quite cogent arguments maintain that if there was extraterrestrial intelligence, they should have visited the Earth by now. This story, a bit tongue in cheek, gives a fictional account of one explanation for The Great Silence, known as The Zoo Hypothesis. Are we a protected species, in a Cosmic Zoo? If so, how did this come about? Read on, for one possible solution to The Fermi Paradox.

The short story is about 6300 words, or about half an hour at typical reading speeds. 





No comments:

Post a Comment