I wrote the post below as a response to a Quora question. I was planning to do some blogging about Stanley Cup data that I am analyzing, so I will kick that off with reprinting this on the blog.
Can
statistics and data analysis be used to predict the results of
cricket matches?
Yes, obviously you can use statistics and data analysis to predict
the results of cricket matches. You can do this for any game that has
a consistent set of rules and a reasonably large collection of data
on games, teams, players, results of matches, etc…
I haven’t followed cricket personally (though it seems like an
interesting game), but I have used statistics to predict horse races
(made some money for a few years) and am currently looking at ice
hockey (Stanley Cup playoffs history) both for prediction purposes
and for understanding of some aspects of the game. Specifically, I
want to explain why Canadian teams haven’t won a Stanley Cup in
over 30 years. I have found a very reasonable, evidence-based answer
to that question, but am holding that back for now. I will eventually
use all this data to do a book on Amazon.
Here is a quick lesson in how to use data to predict sports:
-
First, make sure you actually know how to analyze data. That means a
reasonable grounding in math, statistics, data science and computer
programming (or spreadsheet manipulation). That is usually obtained
via schooling, but an intelligent motivated person can probably pick
up a lot of the essentials with time and self-study.
-
Next, find a source of data for your analysis. When I did the horse
races, decades ago, that meant obtaining a paper copy of The Daily
Racing Form, then hand-entering the data into a computer. For my
hockey study, there are now numerous websites with data. So, if you
have the time and knowledge, you can often scrape a lot of data from
those sites.
-
Next, analyze the data. It helps to have at least a general idea of
what you are looking for (a “research question”). However, this
isn’t an academic study (not usually anyway), so you can relax a
lot of the methodological niceties. So, explore widely, for
interesting insights that you may not have expected when you got
into the study.
-
Do a lot of descriptives. They are always helpful. Here is an example
from my hockey analysis, where you can see that the home team that has an advantage in the
playoffs, though it isn’t huge (about 54%). But it is consistent,
which is important.
Here's another one, where you can see that the team that has a
higher ranking in the regular season does have an advantage in the
playoffs, though it isn’t huge (averaging about 55%, but the best-fit trend-line shows it going from about 60% at the beginning of the period to about 50% by the end).
-
But then do a deeper dive. Being higher ranked is important, but how
much higher ranked is even more important. You can see that the win
pct of the better ranked team goes up, as the difference between
that team and the lesser ranked team increases. This is also quite
consistent, though there are some interesting outliers.
-
The second graph shows the size of the data grouping for each rank
difference (i.e. the number of games where there was that difference
in ranks). Now those outliers don’t look so scary - they only
accounted for a small percentage of games overall. The main sequence
of the data is consistent and has large numbers of games, so
probably very reliable.
-
Then, you might do some more inferential and/or multivariate
statistics. You might also do subgroup analysis, to understand your
data at a more granular level. As you can see below, the
relationship between team success in the playoffs vs the regular
season seems to break down as you get further into playoff rounds
(the red line is for all playoff games, and the blue line is for
semi-final games only).
-
Anyway, you can obviously carry on with this type of analysis,
looking at various other factors (e.g. home game vs away game,
offensive team style vs defensive team style, etc.). I am doing that
with my hockey analysis. All this could be done for cricket, too,
though obviously the factors that you will study will be unique to
the game (e.g. for cricket, you might be interested in batting
averages or runs scored).
Naturally, a big question for this sort of activity is “what is the
purpose?” It might be for:
-
a better understanding of the game
-
to improve team management or coaching
-
win money gambling.
If your purpose is to win money gambling, that is a high hurdle to
jump, unless you are just betting with friends. If you are up against
government-sponsored gambling, you face a very high “take-out”.
That’s the part of the pool that the government keeps for itself.
If that number is high, you need a very big edge to win, so you have
to discover something important about predicting results, that nobody
else knows, especially the government odds-maker.
If your purposes are general understanding or managing a team (like
the movie Moneyball or my quest to find out why Canada can’t seem
to win the Stanley Cup), then the situation is different. The
advantages that you can discover from your analysis are less
tangible, but perhaps more enjoyable.
One last thing about cricket and math/statistics. The famous English
mathematician Hardy was quite a fanatic about cricket and kept
meticulous statistics on the game. I believe that some people thought
that a waste of time, when he could do real math. His famous
colleague Ramunajin was from India, so it is likely that he liked
cricket too, though I don’t know that for sure. In North America,
many economists and the like, seem to be drawn to baseball
statistics, according to my reading.
Here is a bit about G.H. Hardy from a math history site:
“There
was only one passion in Hardy's life other than mathematics and that
was cricket. In fact for most of his life his day, at least during
the cricket season, would consist of breakfast during which he read
The Times
studying
the cricket scores with great interest. After breakfast he would work
on his own mathematical researches from 9 o'clock till 1 o'clock.
Then, after a light lunch, he would walk down to the university
cricket ground to watch a game. “
Here’s a link to my Stanley Cup analysis, of why Canada’s long
Stanley Cup dry spell is not “just one of those things”, using
some fundamental statistical reasoning:
Canada’s
Long Stanley Cup Drought, 2023 Edition
And here’s a link to my blog, with more details about my
horseracing/statistics project:
HORSE
RACING DAYS - Part 1
------------------------------------------------------------------------------------
And here is a short story about the danger of mixing sports and gambling (horse racing).
A Dark Horse
In “A Dark Horse”, a gambler’s desire to hit a big win seems to
lead him to make a Faustian bargain with a supernatural evil.
Or is it all just a string of unnaturally good luck?
The story is just $0.99 U.S. (equivalent in other currencies) and
about 8000 words. It is also available on Kindle Unlimited and is
occasionally on free promotion.
U.S.:
https://www.amazon.com/dp/B01M9BS3Y5
U.K.:
https://www.amazon.co.uk/dp/B01M9BS3Y5
Germany:
https://www.amazon.de/dp/B01M9BS3Y5
France:
https://www.amazon.fr/dp/B01M9BS3Y5
Italy:
https://www.amazon.it/dp/B01M9BS3Y5
Netherlands:
https://www.amazon.nl/dp/B01M9BS3Y5
Spain:https://www.amazon.es/dp/B01M9BS3Y5
Japan:
https://www.amazon.co.jp/dp/B01M9BS3Y5
India:
https://www.amazon.in/dp/B01M9BS3Y5
Mexico:
https://www.amazon.com.mx/dp/B01M9BS3Y5
Brazil:
https://www.amazon.com.br/dp/B01M9BS3Y5
Canada:
https://www.amazon.ca/dp/B01MDMY2BR
Australia:
https://www.amazon.com.au/dp/B01M9BS3Y5