The Betfair Data Scientist – Predictive AFL Model
In the lead up to the AFL Finals, the Betfair Data Scientist team has been digging into the statistics to build a prediction model across the four week period.
They have been through a rigid process to produce the outputs which has involved data collection, data processing and then creating the model itself with comprehensive back testing and refinements.
Over 150 variables have been taken into account using the past 5 seasons of data. Some of the more important and unique variables used were:
- Player stats
- Overall Team stats
- Recent H2H Matchups
- Fantasy league scores
- Ladder position
- Recent finals performances
- Average points margin against top 8
- Middle of season win rate
- Performance per quarter
- Betfair weighted average price
- Betfair last matched price
- Betfair tournament odds at the end of the H & A season
The prediction accuracy of the best model was 72.3% based on the results of the past 4 years of AFL Finals.
How has this model been created?
Step 1 – Collate as much data as possible.
For the purpose of the model, we collated over 15 years’ worth of data. This includes:
- Team level data: Goals, points, total score, margin etc.
- Aggregated player level data: Contested possessions, kicks, handballs.
- Fantasy results: Fantasy scores for all teams.
- Betfair Data: Prices across all games.
After reviewing the data collated, the most recent data was chosen to be analysed within the Model. This meant only 5 years’ worth of data would be used.
Step 2 – Processing the data
With all of this data now prepared for analysis, we chose to utilise feature engineering to create more meaningful variables from the raw statistics we have already collated. An example of this is splitting a team’s performance throughout a year into three segments.
Step 3 – Building the Model
Part 1 – Implement the Model
This is a classification problem and to accurately predict classification problems there is a set of algorithms to assist in the process – we decided to utilise the random forest in our approach.
Part 2 – Analysing the Model
This involves running the model which then provides an output. You then validate your results to determine your accuracy. If the results are not hitting your desired accuracy it’s a matter of adjusting
Software used: https://www.r-project.org/