How to Build an AFL Model

This article was compiled by a former quantitative analyst turned professional punter with over 10 years of experience in the banking sector developing quantitative models for risk, pricing and financial market related purposes.

If you haven’t already read his client story, you can do so here.


Step 1 - Choose the type of Model

Although there are many methods to build a sports prediction model (and more and more cropping up as sports analytics becomes more popular), a few of the more mainstream methods can be categorised into the following:

  1. Elo rating team model – Probably the most popular type of sports prediction model are variants of the Elo rating system, which was originally invented to rank competitive chess players. The Elo rating system has in fact become so widely used that it is even used in dating apps to rank the ‘attractiveness’ of users.
  2. Offense-Defense team model – These models attempt to rate the offensive and defensive capabilities of a team separately. These are often then converted into a predicted score for each team which can be further converted into a fair win probability.
  3. Player models – These models attempt to rate the capabilities of each player in a team and then aggregate these ratings in some way to provide a team level prediction. These have become increasingly popular recently as they have the added advantage of being able to account for changes in team-lineups.

In this article, the focus will be on the process of building a model and the criteria used to assess a model. Given this, I will be using the Elo rating team model to demonstrate this given its simplicity and will provide more detail of the other types of models possibly in subsequent articles.


Step 2 – Gather and cleanse the data

The first step to building any model is to first gather, cleanse and analyse the data. Although this step will almost definitely be the most uninteresting, it is without a doubt the most important one as the old modelling adage of ‘garbage in, garbage out’ definitely still holds true today.

Luckily there are a lot of good AFL resources out there which eliminates the need to scrape your own data. If you know R, one of the better ones is fitzRoy, which is an R package which provides a function that scrapes match stats, match results, player stats and much more. However, as our Elo model only requires very simple match results and score variables, for our purposes, we have chosen to download the results and odds data from aussportsbetting. This dataset contains the following variables that we will use:

Note: It’s good to do some completeness checks on the data to check all matches are accounted for and all fields are populated with sensible values, but let’s assume they have already been done and the data is reasonably clean.


Step 3 – Setup the model

Now we get to start building the model! I have chosen to build the model in Python, but any number of programming languages would also work. In fact, for a model this simple, even Excel is more than powerful enough to do what we need.

The Elo ratings system works by assigning a rating to each team to determine a fair win probability based on the difference in rating value of the two teams as shown in the formula below:

where  and  are the Elo ratings for team A and team B respectively.

Playing around with values of this formula, we see that if team A is superior to team B by a magnitude of 100 in rating points, their probability of winning is 64%, if we increase this ratings superiority now to 200, their win probability increases to 76%.

The starting values for the Elo ratings are set at 1,500[1] and the formula to update the games progressively is given below:

[1] The exception is Gold Coast and Greater Western Sydney which we arbitrarily start at 1300 as these are expansion teams that started in 2011 and 2012 respectively

  • O is the outcome of the match and is given a value of 1 for a win, 0.5 for a draw and 0 for a loss.
  • The K-factor determines the speed of the ratings update and needs to be optimised from the data.

For this iteration of the model, I’ve chosen to use an arbitrary value of K at 20.

It is important to remember that there is absolutely nothing special about the formula above. Don’t be afraid to make changes to it to incorporate other factors you think may be predictive, which I will touch on later in Step 5.


Step 4 – Assess the model

Now it’s time to assess our model. Given the very bare nature of the model we have built, it is unlikely that it will perform well but provides a benchmark for us to improve on. There are many ways to assess a model and in reality, the best way is to employ a number of different metrics. Some of the more common ones are:

  • % of games correctly predicted – Probably the easiest to understand, this just measures the % of game outcomes correctly predicted by the model
  • Log loss – Is a measure of the accuracy of a model whose output is a probability value between 0 and 1. It penalises predictions the more they diverge from actual outcomes.
  • Mean average error (MAE) – A good way to assess a model if the model output is a margin of victory. The MAE measures the average error of the predicted margin against the actual margin.

Given our model produces a probability outcome to predict a binary win/loss outcome, we’ll go with the log loss method of assessment. The interpretation of the log loss metric is the higher the value, the poorer the model’s predictive power. In the graph below, I’ve shown the log loss per year of our basic Elo model with the log loss of the Betfair closing odds also shown as a comparison.

Overall, it is to no surprise that our model’s average log loss is significantly worse than the Betfair closing odds across all years, with the average log loss being 0.622 compared to the Betfair closing odds log loss of 0.565. Our basic model is quite a bit worse than the Betfair closing odds and it is unlikely we’d make any money from this model …. but that’s okay because there’s a lot of room for improvement!


Step 5 – Iterate, Iterate, Iterate

Turning a model from an unprofitable model to a profitable model is all about iterating steps 3 and 4. Now that we have the foundations of an Elo model in place, we can start making improvements to take into account additional factors that we think are important. We can make these adjustments in step 3, and then reassess the model in step 4 to see if they add any explanatory power to our model. Examples of some improvements might be:

  • Home Ground Advantage – How much of an advantage does a team have when playing at home? Are there underlying factors on why a home team in one specific game may hold a larger or smaller advantage relative to the home team in a different game?
  • K-Factor optimisation – Instead of using an arbitrary K-Factor, we should optimise the K-Factor value. Also, is it too simplistic to assume the K-factor is constant throughout the season?
  • Different method of updating ratings – Rather than a binary outcome of win or loss, can we utilise more useful information to determine how the ratings update, for example by incorporating margin of victory?

There are many more things that we can consider and as a general rule of thumb, the more you think outside the box with these improvements, the less likely the market has already captured them. As a simple demonstration, I’ve made some very rudimentary improvements to the base Elo, bastardising the equation in a way that I feel makes sense, based on the three suggested improvements above.

After some very straightforward changes, the model’s log loss has improved significantly to 0.576! I won’t go into the specific details of these improvements, but the main take-away is that once we have the framework down, it becomes easy to iterate and experiment incorporating additional factors to the model.


Summary

Hopefully this article serves as a basic blueprint of how to go about building your first AFL prediction model. The most valuable modelling breakthroughs are often found through experimentation which makes having a good model building and assessment process even more important.

Remember that the best models are part science, part art – don’t get too hung up on following the basic functional form of models out there in literature and don’t be afraid to experiment with your own creative modifications as these are usually what turn out to be the biggest modelling breakthroughs.


BONUS

If you’re inspired to create your own AFL model, or already have one, we have a world first opportunity for you. The Betfair Scholarship Program. Open to Betfair customers only (you must be logged in with an account to see this content), you can apply for a $1,000 Scholarship. This unique Program allows you to test your model, against the market, without risking your own money.

At Betfair, we want winners. We hope your model beats the market.


Related Articles

Automated Betting Station: Build Your Betfair Bot

Did you want to create a Betfair bot: an automated betting robot that bets in your sleep? Betfair is here ...

Betfair Data

If you’re looking to build an AFL, NRL or Super Rugby model, then start with Betfair Data. We share ...

AFL Prediction Model

Betfair’s internal team of Data Scientists have created an AFL Prediction Model. The model creates probabilities for every game. ...