Martin Ingram – Data Scientist
Martin is passionate about good code and gaining insights from data with a particular interest in sports analytics. He completed a Bachelor Of Arts, Natural Sciences (Physical) at Cambridge before completing his Masters of Science, Computing Science.
In the last article, I wrote about Elo, which was found to be the best-performing published model for predicting match wins in a recent analysis. In this article, we will explore another class of models – point-based models – which, while not as predictive of matches as Elo, allow the prediction of much finer-grained quantities of a match, such as the number of sets, number of games, and even number of points.
The i.i.d. assumption
Point-based models, as their name suggests, model tennis matches from the point level upwards. They are typically based on the assumption that the probability of winning a point on serve in a tennis match is independent and identically distributed (i.i.d.). The independent part means that the probability of winning a point on serve is not influenced by the outcome of previous points.
This amounts to saying that there is no momentum: the probability stays constant, no matter whether the player won or lost his or her last points. The statement about identical distribution means that each point is considered equal: the probability of winning it stays the same, no matter whether it is an important point in the final set of a match, or an unimportant point at some point in the first.
These assumptions clash with our intuitions about momentum and psychological pressure – are they really true? In fact, Klaassen & Magnus showed that they are not: players are more likely to win a point on serve if they won the previous point (suggesting momentum) and less likely to win the point if it is important (suggesting pressure). They found, however, that these effects are quite weak and that the assumption of i.i.d. is thus a good one for match prediction.
The power of the assumption is that, as we will see in the course of this article, it allows us to make detailed predictions about the match: not just about who is likely to win, but also by what margin.
Using the i.i.d. assumption
To see how the i.i.d. assumption allows us to make progress, we can consider a service game. The assumption means that we will look at a tennis match as a sequence of biased coin tosses: each time a player A plays a point on serve, they win or lose it with a fixed probability , no matter what stage the match is at. Winning the service game to love would mean four won “coin tosses” in a row, i.e. . By considering all the possible ways a service game can be won (including via deuce) and adding them together, Newton & Keller arrive at an equation for the probability of winning a service game, which is plotted below.
We can see that the probability has a simple shape. It is also interesting to note that at the WTA’s average of 56% of points won on serve, the probability of holding serve is 65%, while at the ATP’s average of 64%, it is 81%. This large gap explains why service breaks are much more common on the WTA tour compared to the ATP.
Predicting matches using the assumption
Newton and Keller derive not only the probability of holding serve but also the probability of winning a set, a tiebreak, the match, as well as the probability of reaching individual set scores using just the probabilities of winning a point on serve for each of the players. The derivation of these quantities is a little too complicated to reproduce in this introduction to point-based models.
Instead, I will show how the inputs to the i.i.d. model can be calculated using a particular point-based prediction model, and then some illustrations of what the i.i.d. model is able to predict. All the information required to implement the i.i.d. equations can be found in Newton and Keller’s paper.
Barnett & Clarke’s model
The best point-based model found in Stephanie Kovalchik’s analysis was a model published by Tristan Barnett and Stephen Clarke at Swinburne University in 2005. It calculates the probability of player winning a point on serve against player in the following way:
We can break this equation down into three terms:
This is the average probability of winning a point on serve for the tournament. This term is important as some tournaments are “fast” and help the server on average – such as Wimbledon – while others, such as the French Open, are “slow” and have the opposite effect. In 2014, for example, the average probability of winning a point on serve was 67.2% at Wimbledon, compared to 62.4% at the French Open.
is the average probability of player winning a point on serve, and is the tour average on serve. This term is greater than zero if player is better on serve than the average tour player, and smaller if he or she is worse.
is the average probability of player winning a point on return, and is the tour average on return. Just like the previous term, this one is positive if player is better on return than average, and negative otherwise.
The model thus works the following way: we take the tournament baseline and increase it by how much our player is better than average on serve, then decrease it by how much better than average their opponent is on return. In this way, both ’s serving ability and ’s returning ability is taken into account, as well as the speed of the tournament surface.
Example: Roger Federer vs. Novak Djokovic, Wimbledon 2015
Calculating the serve-winning probabilities
To see how Barnett and Clarke’s model works in practice, we can look at its predictions for a particular match. At Wimbledon 2015, Novak Djokovic and Roger Federer met in the final. We can summarise their results over the past year in a table:
We can see that Roger Federer was more successful on serve in the year leading up to the final, while Djokovic performed better on return. To compute Barnett and Clarke’s estimate, we also need the tour averages:
Putting all this together using Barnett and Clarke’s equation, we find that it predicts serve-winning probabilities of 67.0% for Roger Federer and 68.6% for Novak Djokovic.
Using the i.i.d. model with these estimates
Using Newton and Keller’s equations, we can calculate the probability of each player winning the match. For the best of five format and the calculated serve probabilities, these are 59.4% for Novak Djokovic and 40.6% for Roger Federer. The i.i.d. model allows us to break this probability down by the number of sets:
We see that Djokovic winning in four sets was the most likely prediction, followed by a win in five sets. The match’s final score was 7-6(1) 6-7(10) 6-4 6-3 for Djokovic, so in this case, the model did well.
Next, we can look at the most likely set scores. Newton and Keller’s equations allow the calculation of , the probability of reaching any score in a set. For the Wimbledon match, the result using Barnett and Clarke’s estimate is:
We see that the set scores which actually happened in the match were the ones considered most likely by the i.i.d. model.
One final very interesting property of the i.i.d. model is that it can calculate updated winning probabilities at each point in the match. The plot below shows how Novak Djokovic’s probability of winning changed over the course of the match:
We see that Djokovic started with a probability of 59.4%, as calculated. The probability hovers around that number until it drops sharply to around 45% at around point 30: this was when Federer broke to lead 4-2. However, Djokovic broke straight back, effectively levelling the match. The next big change occurs when Djokovic wins the first set tiebreak.
He wins 7-1, causing the sharp increase at around point 70, bringing Djokovic’s probability up to around 75%. The second set is very long and lasts until Federer wins an epic 12-10 tiebreak, tying the match at one set all and bringing Djokovic’s win probability back to pre-match levels. However, Djokovic breaks to lead 3-1 in the third set, bringing his probability back to the mid-seventies.
He goes on to win that set and, after he breaks Federer in the fourth set, his victory is almost certain.
As we have seen, point-based models consist of two parts: a model of a tennis match based on the probability of winning a point on serve (which is assumed to stay constant, or i.i.d., throughout the match), and a model predicting these probabilities.
The best pure point-based model – Barnett and Clarke’s opponent-adjusted model — does not perform as well as Elo for match prediction, but it does offer a wealth of additional predictions.
Point-based models need not perform worse than Elo: in fact, it is possible to find serve-winning probabilities which make the estimates consistent with Elo or any other model, which may be the subject of a future article.
One concern with the i.i.d. models is that, as mentioned in the introduction, their central assumption is only approximately true. A number of papers, including one I co-authored, have looked into how players differ from the assumption of constant probability. Though they generally find only small effects, taking them into account may give a slight edge, particularly for within-match prediction.
Overall, point-based models such as Barnett and Clarke’s model are among the best for tennis match prediction and, as we have seen, are unique in the amount of information they are able to provide about matches. As the example of Federer – Djokovic showed, their predictions can be almost uncannily precise, correctly forecasting set scores and the number of sets.
Of course, the predictions are not always this spot on, but they are competitive with the best models, with the exception of Elo, which make them an intriguing class of tennis models.