Pi Day 2020 –Pi and the Normal Distribution

The Mathematical Theory of Pi and the Normal Distribution

Wigner wrote a famous paper (“The Unreasonable Effectiveness of Mathematics”) about how odd it is that mathematics and the real world have such a close coupling, so to speak. An illustration of this is outlined below, which is especially relevant on Pi Day.

One of the interesting things about Pi is how it shows up in unexpected places, in the mathematical and real world. And one of the interesting things about the normal distribution (or Gaussian) is how often it accurately describes real world phenomena, at least to a good approximation.

Pi is actually embedded in the formula that generates the Normal Distribution, as is the transcendental number e and the irrational number square root of 2. Quoting from the book “Understanding Probability” (H. Tijms):

“One could say that the normal curve is a natural law of sorts, and it is worth noting that each of the three famous mathematical constants root 2, pi=3.141… and e=2.718… play roles in its makeup. Many natural phenomena, such as the height of men, harvest yields, errors in physical measurements, luminosity of stars, returns on stocks, etc., can be described by a normal curve.”

The formula is shown below. Sigma is the standard deviation of the distribution, mu is the mean.

If you want some more explanation of why pi shows up in the formula for the normal distribution, try this blog. He has a nice explanation, if you know some calculus.

https://davegiles.blogspot.com/2016/01/why-does-pi-appear-in-normal-density.html

As a bit of differential calculus shows, as well as some visual inspection, the peak value for this symmetric function is at e to the 0^th power (thus equal to 1), which happens when x=mu, the mean of the distribution. This also results in the height of the peak being related to pi:

Peak = 1/(sigma * square root (2 times pi))

If you know the peak of a normal distribution, you can use this to estimate Pi from data.

Another relationship than can be deduced from all this is:

MAD = sigma * square root(2/pi)

MAD is the average absolute deviation, which is the mean of the absolute values of the differences between points in the distribution and the mean of the distribution.

That’s rather like the formula for variance, but uses absolute value rather than squaring the difference. Either way, you avoid negative numbers. You don’t see MAD used all that much in regular statistical practice, so it’s interesting to use it here.

Using Normally Distributed Data to Estimate Pi

Anyway, the point of all this, is that you can use the numbers from a normally distributed dataset to estimate pi. I played around with that, giving the results below. I used the second relationship, involving MAD for the purpose.

1. Using Simulations in Excel

First off, here are results from using simulated data, based on the normal distribution. In Excel, you can use their Data Analysis package or the formula

norminv(rand(), mean, stdev)

to generate the required numbers. Then you can use the variance formula in Excel to calculate the variance of the generated data, and the average formula to generate the mean, from which you can calculate the deviation for each point, take the absolute value of that and generate an average. Then Pi can be estimated from:

PI = (2 * (variance*variance))/(MAD*MAD)

This will only be precisely true for a perfect, infinite normal distribution dataset, which is of course impossible in the real world or even in a computer. So, as noted, this is treated as an estimate.

Here are some estimates for Pi, using various numbers of randomly generated, normally distributed data points:

Value of Pi, estimated from Normal Distribution, various size of simulated dataset
Run	500 K	50 K	5K	.5 K	.1 K
1	3.1369	3.1498	3.0909	3.0478	3.2250
2	3.1365	3.1243	3.1248	3.5711	3.1367
3	3.1414	3.1160	3.1686	2.7884	3.6031
4	3.1374	3.1308	3.1689	3.4772	3.1697
5	3.1332	3.1353	3.1560	2.7704	2.6016
6	3.1370	3.1443	3.2166	3.2497	2.4286
7	3.1493	3.1419	3.3306	3.5507	3.1429
8	3.1473	3.1672	3.2095	3.4633	2.5636
9	3.1526	3.1558	3.1258	3.3079	3.4690
10	3.1390	3.1567	3.1593	3.5132	3.3688
11	3.1357	3.1565	1.7958	3.4977	3.4358
12	3.1513	3.1229	3.2390	3.3730	3.2837
13	3.1531	3.1332	3.1755	3.2773	3.4404
14	3.1393	3.1264	3.1467	3.2081	3.2094
15	3.1507	3.1422	3.1690	3.2656	1.9962
16	3.1380	3.1167	3.2285	3.1871	3.1321
17	3.1481	3.1533	3.0957	3.0320	3.3556
18	3.1387	3.1288	3.0366	3.0675	3.1471
19	3.1493	3.1156	3.1311	2.8541	2.9467
20	3.1286	3.1500	3.2534	3.0860	3.0628
21	3.1397	3.1321	3.2255	3.0501	3.1321
22	3.1353	3.1855	3.1319	3.0628	3.0040
23	3.1433	3.1674	3.1378	3.3351	2.9529
24	3.1448	3.1394	3.1329	2.9784	3.5530
25	3.1495	3.1160	3.1065	3.0981	2.1003
Est Pi	3.1422	3.1403	3.1103	3.2045	3.0584
Actual Pi	3.1416	3.1416	3.1416	3.1416	3.1416
Abs Diff	0.0007	0.0013	0.0313	0.0629	0.0831
Pct Diff	0.021%	0.040%	0.997%	2.002%	2.647%
N	500000	50000	5000	500	100
Sqrt N	707	224	71	22	7

The table shows that this isn’t a very effective way to estimate Pi, but it does work in principle. As the number of points in the simulated data set goes up, the accuracy of the estimate improves, from only being accurate to within 1 decimal place (2.6 pct. error) with 100 points to being accurate to within 3 decimal places (0.02 pct.) with half a million points in the simulated dataset. That’s a pretty good improvement, but not exactly what you might call a really rapid convergence to the value of Pi.

The histograms above show the actual distributions for each value of N, as well as the estimate for Pi derived from that distribution. As you can see, even with this amount of data points, the distributions still have some lack of smoothness, so it is no surprise that the estimate for Pi is only good to about 3 decimal places, at best.

2. Using Real World Dataset

Here are results of estimating Pi, from using some real world datasets, obtained from the internet.

Value of Pi, Estimated from Real World Datasets, Approxmated by Normal Distributions
Exp	Data	Pi Est	Pi Actual	Abs Dff	Pct Diff	N
1	Fly Wings	3.07716	3.14159	0.06443	2.1%	100
2	Wheat	3.05782	3.14159	0.08377	2.7%	500
3	Hurricanes	3.41711	3.14159	0.27551	8.8%	755
4	Ozone	3.42920	3.14159	0.28760	9.2%	70
5	Precipitation	3.16298	3.14159	0.02139	0.7%	144
6	Earthquakes	3.18518	3.14159	0.04359	1.4%	100
7	Columbia R 5	2.92371	3.14159	0.21788	6.9%	66
8	Columbia R 55	3.53768	3.14159	0.39608	12.6%	66
9	Columbia R 95	3.17431	3.14159	0.03272	1.0%	66
	Avg.	3.21835	3.14159	0.15811	5.0%	207

As you can see, the real world data, which was modelled with a standard normal distribution, yielded rather poor estimates of Pi, though recognizably in the ballpark. But one has to keep in mind that the datasets are merely measurements of real world statistics – intuitively, we wouldn’t expect them to have anything at all to do with the value of Pi.

It is also good to keep in mind that these datasets averaged only about 200 points, with the majority of them being <= 100 members. When we compare the results for the 100 point datasets simulated above, we see that they aren’t so bad. The simulated small datasets only estimated Pi to within about 2.6%, while these real-world datasets came in at about 5.0%. One wonders how a very large real-word dataset might fare in such an exercise.

Looking at histograms of the real-world datasets it is clear that they do depart from normality, some fairly substantially, though they would probably be considered “close enough” for a lot of statistical tests that assume a normally distributed dataset.

Anyway, I am kind of charmed by the idea that you can get a fairly close estimate of Pi by measuring current speeds in the Columbia River (within 1.0 pct of Pi) or the precipitation in Reading PA over about 150 years (within 0.7 pct of Pi).

Related Blogs

And here are a couple of other Pi Day posts that I have done, with similar Pi-related experiments:

Calculating Pi from Shooting Arrows

https://dodecahedronbooks.blogspot.com/2019/03/pi-day-2019-shooting-arrows-at-target.html

Calculating Pi from Probability Theory (Buffon’s Needle): https://dodecahedronbooks.blogspot.ca/2017/03/pi-day-floor-pie-and-floor-pi.html

Calculating Pi via Nested Polygons:

https://dodecahedronbooks.blogspot.com/2014/03/astrophysics-corner-part-7-happy-pi-day.html

And as always:

Sources:

Beckmann, Petr. A History of Pi (p. 101). St. Martin's Press. Kindle Edition.

Henk Tijms. Understanding Probability: Chance Rules in Everyday Life. Cambridge.

Real World Data

https://seattlecentral.edu/qelp/Data_EnvironmentTopics.html

So, now that you have done some math, you should read a science fiction book, or even better, a whole series. Book 1 of the Witches’ Stones series even includes a reference to pi.:

Kati of Terra

How about trying Kati of Terra, the 3-novel story of a feisty young Earth woman, making her way in that big, bad, beautiful universe out there.

http://www.amazon.com/gp/product/B00811WVXO

http://www.amazon.co.uk/gp/product/B00811WVXO

The Witches’ Stones

Or, you might prefer, the trilogy of the Witches’ Stones (they’re psychic aliens, not actual witches), which follows the interactions of a future Earth confederation, an opposing galactic power, and the Witches of Kordea. It features Sarah Mackenzie, another feisty young Earth woman (they’re the most interesting type – the novelist who wrote the books is pretty feisty, too).

https://www.amazon.com/dp/B008PNIRP4

https://www.amazon.co.uk/dp/B008PNIRP4

The Magnetic Anomaly: A Science Fiction Story

“A geophysical crew went into the Canadian north. There were some regrettable accidents among a few ex-military who had become geophysical contractors after their service in the forces. A young man and young woman went temporarily mad from the stress of seeing that. They imagined things, terrible things. But both are known to have vivid imaginations; we have childhood records to verify that. It was all very sad. That’s the official story.”

https://www.amazon.com/dp/B0176H22B4

https://www.amazon.co.uk/dp/B0176H22B4

The Zoo Hypothesis or The News of the World: A Science Fiction Story

In the field known as Astrobiology, there is a research program called SETI, The Search for Extraterrestrial Intelligence. At the heart of SETI, there is a mystery known as The Great Silence, or The Fermi Paradox, named after the famous physicist Enrico Fermi. Essentially, he asked “If they exist, where are they?”.

Some quite cogent arguments maintain that if there was extraterrestrial intelligence, they should have visited the Earth by now. This story, a bit tongue in cheek, gives a fictional account of one explanation for The Great Silence, known as The Zoo Hypothesis. Are we a protected species, in a Cosmic Zoo? If so, how did this come about? Read on, for one possible solution to The Fermi Paradox.

The short story is about 6300 words, or about half an hour at typical reading speeds.

https://www.amazon.com/dp/B076RR1PGD

https://www.amazon.co.uk/dp/B076RR1PGD

Dodecahedron Books

Saturday, 14 March 2020

Pi Day 2020 –Pi and the Normal Distribution

Pi Day 2020 –Pi and the Normal Distribution

The Mathematical Theory of Pi and the Normal Distribution

Using Normally Distributed Data to Estimate Pi

1. Using Simulations in Excel

2. Using Real World Dataset

Related Blogs

Kati of Terra

The Witches’ Stones

The Magnetic Anomaly: A Science Fiction Story

No comments:

Post a Comment