Martin Ingram – Data Scientist
Martin is passionate about good code and gaining insights from data with a particular interest in sports analytics. He completed a Bachelor Of Arts, Natural Sciences (Physical) at Cambridge before completing his Masters of Science, Computing Science.
In the previous article, I gave an overview of the different approaches used to model tennis matches. In this article, I go into more detail about one particular model, the Elo model, which performs particularly well. In fact, when Stephanie Kovalchik analysed 11 different models from different classes (regression, point-based and ranking-based), the Elo model came out on top (70% accuracy for ATP matches in 2014).
How does the Elo model work
Elo works by assigning a rating to each player. When two players play each other, the players’ win probabilities are given by their difference in Elo rating. The calculation is as follows:
For example, if a player has an Elo rating of 1,800 and his opponent has a rating of 2,000, the probability becomes 24.1%.
How do we arrive at Elo scores for players? Elo is calculated recursively, and we need to take the full sequence of wins and losses up to the match in question into account. The Elo score for each player’s first match is set to 1,500. After two players play their first match against each other, the match result is used to update this rating using the following equation:
Where elo_i (t+1) is the updated Elo, elo_i (t) was their Elo before the match, is a factor we will discuss more later (for now, assume it to be constant, e.g. 32), outcome is an indicator of the match outcome (1 if it was won by player A, 0 if lost), and Pwin was the pre-match probability of winning for player A, as given by the previous formula.
If two players play their first match, both with Elo of 1,500, Pwin is equal to 0.5 so, for a K-factor of 32, the winner would gain 16 points and the loser would lose 16 points. In their next match, the players start with Elo ratings of 1,516 and 1,484 respectively, and by updating their rating for all matches played, their current Elo rating can be calculated.
The K-factor determines how much the Elo rating should change following a match result. In the example, the K-factor of 32 means that at most, a player could lose 32 points (if their probability was 1 and they lost). When using Elo models in practice, the biggest consideration is which K-factor to use.
The Elo model which performed well in Kovalchik’s comparison is one devised by fivethirtyeight.com. Rather than using a constant K, they instead have it depend on the number of matches a player has played. The equation for K becomes:
Where c is a constant, M is the number of matches in the player’s dataset, o is a small offset (to avoid very large values when M is low), and s is a shape parameter which allows for more flexibility in the curve’s shape. By experimenting with different parameters o, c and s, they finally settle on the following form:
We can see how this K-factor varies with the number of matches by plotting it:
Remember that the K-factor is the maximum Elo update for any given match. Hence, using this functional form for the K-factor means that players with few matches receive large maximum updates. At slightly under 200 matches, the K-factor drops below a constant factor of 32 and changes little after about 400 matches, settling at a value around 20.
Why does it make sense to have K depend on the number of matches played? The idea is that when we have seen a player play few matches, we are not particularly confident in our Elo assignment and are open to large updates. Once we have seen a player play hundreds of matches, it makes sense to make our updates smaller: we have seen them play many matches and have reason to be confident in our Elo rating.
Example: US Open 2016
As an example, here are how Stan Wawrinka’s matches at the US Open 2016 changed his Elo rating:
We can see that Stan Wawrinka faced both opponents he was expected to beat – he had probabilities greater than 80% against his first four opponents – as well as three tough opponents in his final rounds.
This results in small updates for the first matches, and large ones for the later ones. We can also see that Elo was not perfect here: though it predicted the first four matches correctly and had his match against Del Potro at around 50/50, it did not expect his wins over Nishikori and Djokovic.
However, we can also see how the Elo system self-corrects itself: if Nishikori and Wawrinka were to meet again, Nishikori’s loss in Elo combined with Stan Wawrinka’s gains would make his prediction more favourable.
When implementing an Elo model, it is important to try to get as large a database of matches as possible. It turns out that using match data starting from 1968 (available e.g. from Jeff Sackmann’s github account) increases accuracy from 67% to 70% compared to using match data starting from a year before when predicting ATP matches in 2014.
The Elo model works well: At the US Open, it predicted 73.5% of ATP matches correctly, which is not too far off the bookmakers, who got 76.1% right. As mentioned in the introduction, it also outperforms all other published models on matches played on the ATP in 2014.
This may be surprising: after all, the Elo model only takes wins and losses into account. It knows nothing about the players involved (some may match up better than others), about surface (some players may perform better on one than on another), or about how decisive each victory was (a contested final-set victory is treated the same as an easy straight-sets win).
Some of these factors may explain why the bookmakers still hold an edge, and a model taking them into account could be an interesting improvement.
In the next article, I will be talking about models specifically designed around the rules of tennis: point-based models. These will allow us to predict much more than just the winner or loser of a match, such as the number of sets that are expected, and which set scores are most likely.