Thursday, 2 May 2024

Some Analysis of Stanley Cup Winners since 1993-94

I wrote the post below as a response to a Quora question.  I was planning to do some blogging about Stanley Cup data that I am analyzing, so I will kick that off with reprinting this on the blog.

 

Can statistics and data analysis be used to predict the results of cricket matches?

Yes, obviously you can use statistics and data analysis to predict the results of cricket matches. You can do this for any game that has a consistent set of rules and a reasonably large collection of data on games, teams, players, results of matches, etc…

I haven’t followed cricket personally (though it seems like an interesting game), but I have used statistics to predict horse races (made some money for a few years) and am currently looking at ice hockey (Stanley Cup playoffs history) both for prediction purposes and for understanding of some aspects of the game. Specifically, I want to explain why Canadian teams haven’t won a Stanley Cup in over 30 years. I have found a very reasonable, evidence-based answer to that question, but am holding that back for now. I will eventually use all this data to do a book on Amazon.

Here is a quick lesson in how to use data to predict sports:

  • First, make sure you actually know how to analyze data. That means a reasonable grounding in math, statistics, data science and computer programming (or spreadsheet manipulation). That is usually obtained via schooling, but an intelligent motivated person can probably pick up a lot of the essentials with time and self-study.

  • Next, find a source of data for your analysis. When I did the horse races, decades ago, that meant obtaining a paper copy of The Daily Racing Form, then hand-entering the data into a computer. For my hockey study, there are now numerous websites with data. So, if you have the time and knowledge, you can often scrape a lot of data from those sites.

  • Next, analyze the data. It helps to have at least a general idea of what you are looking for (a “research question”). However, this isn’t an academic study (not usually anyway), so you can relax a lot of the methodological niceties. So, explore widely, for interesting insights that you may not have expected when you got into the study.

  • Do a lot of descriptives. They are always helpful. Here is an example from my hockey analysis, where you can see that the home team that has an advantage in the playoffs, though it isn’t huge (about 54%). But it is consistent, which is important.


    Here's another one, where you can see that the team that has a higher ranking in the regular season does have an advantage in the playoffs, though it isn’t huge (averaging about 55%, but the best-fit trend-line shows it going from about 60% at the beginning of the period to about 50% by the end).


  • But then do a deeper dive. Being higher ranked is important, but how much higher ranked is even more important. You can see that the win pct of the better ranked team goes up, as the difference between that team and the lesser ranked team increases. This is also quite consistent, though there are some interesting outliers.


  • The second graph shows the size of the data grouping for each rank difference (i.e. the number of games where there was that difference in ranks). Now those outliers don’t look so scary - they only accounted for a small percentage of games overall. The main sequence of the data is consistent and has large numbers of games, so probably very reliable.


  • Then, you might do some more inferential and/or multivariate statistics. You might also do subgroup analysis, to understand your data at a more granular level. As you can see below, the relationship between team success in the playoffs vs the regular season seems to break down as you get further into playoff rounds (the red line is for all playoff games, and the blue line is for semi-final games only).


  • Anyway, you can obviously carry on with this type of analysis, looking at various other factors (e.g. home game vs away game, offensive team style vs defensive team style, etc.). I am doing that with my hockey analysis. All this could be done for cricket, too, though obviously the factors that you will study will be unique to the game (e.g. for cricket, you might be interested in batting averages or runs scored).

Naturally, a big question for this sort of activity is “what is the purpose?” It might be for:

  1. a better understanding of the game

  2. to improve team management or coaching

  3. win money gambling.

If your purpose is to win money gambling, that is a high hurdle to jump, unless you are just betting with friends. If you are up against government-sponsored gambling, you face a very high “take-out”. That’s the part of the pool that the government keeps for itself. If that number is high, you need a very big edge to win, so you have to discover something important about predicting results, that nobody else knows, especially the government odds-maker.

If your purposes are general understanding or managing a team (like the movie Moneyball or my quest to find out why Canada can’t seem to win the Stanley Cup), then the situation is different. The advantages that you can discover from your analysis are less tangible, but perhaps more enjoyable.

One last thing about cricket and math/statistics. The famous English mathematician Hardy was quite a fanatic about cricket and kept meticulous statistics on the game. I believe that some people thought that a waste of time, when he could do real math. His famous colleague Ramunajin was from India, so it is likely that he liked cricket too, though I don’t know that for sure. In North America, many economists and the like, seem to be drawn to baseball statistics, according to my reading.

Here is a bit about G.H. Hardy from a math history site:

There was only one passion in Hardy's life other than mathematics and that was cricket. In fact for most of his life his day, at least during the cricket season, would consist of breakfast during which he read The Times studying the cricket scores with great interest. After breakfast he would work on his own mathematical researches from 9 o'clock till 1 o'clock. Then, after a light lunch, he would walk down to the university cricket ground to watch a game. “

Here’s a link to my Stanley Cup analysis, of why Canada’s long Stanley Cup dry spell is not “just one of those things”, using some fundamental statistical reasoning:

Canada’s Long Stanley Cup Drought, 2023 Edition

And here’s a link to my blog, with more details about my horseracing/statistics project:

HORSE RACING DAYS - Part 1

------------------------------------------------------------------------------------

And here is a short story about the danger of mixing sports and gambling (horse racing).

A Dark Horse

In “A Dark Horse”, a gambler’s desire to hit a big win seems to lead him to make a Faustian bargain with a supernatural evil.  Or is it all just a string of unnaturally good luck?

The story is just $0.99 U.S. (equivalent in other currencies) and about 8000 words. It is also available on Kindle Unlimited and is occasionally on free promotion.

U.S.: https://www.amazon.com/dp/B01M9BS3Y5

U.K.: https://www.amazon.co.uk/dp/B01M9BS3Y5

Germany: https://www.amazon.de/dp/B01M9BS3Y5

France: https://www.amazon.fr/dp/B01M9BS3Y5

Italy: https://www.amazon.it/dp/B01M9BS3Y5

Netherlands: https://www.amazon.nl/dp/B01M9BS3Y5

Spain:https://www.amazon.es/dp/B01M9BS3Y5

Japan: https://www.amazon.co.jp/dp/B01M9BS3Y5

India: https://www.amazon.in/dp/B01M9BS3Y5

Mexico: https://www.amazon.com.mx/dp/B01M9BS3Y5

Brazil: https://www.amazon.com.br/dp/B01M9BS3Y5

Canada: https://www.amazon.ca/dp/B01MDMY2BR

Australia: https://www.amazon.com.au/dp/B01M9BS3Y5


 




No comments:

Post a Comment