Betting on Sports: How to get the right Data

When you’re betting on sports, you need the right data. Jack Houghton discusses the importance of it in your betting, what to prioritise and how it relates to the current Rugby World Cup.

This latest article is part of the Psychology of Betting series, where Jack Houghton talks about the mental side of punting.

Check out all of the Psychology of Betting Articles on the Betfair Hub.

Availability to the public

Michael Lewis’ 2003 book, Moneyball, describes how sabermetrics transformed professional baseball. Nerds, armed with decades of statistics, were able to debunk commonly held misconceptions about what made a good batter, pitcher and fielder, providing scouts with more meaningful data upon which to base their transfer decisions, and coaches with insights that could shape in-game tactics.

Baseball, a largely static game which invites copious data collection, is a natural home for data wonks, where the challenge is not finding information, but deciding which information is useful and which a distraction. In other sports, a similarly analytical approach has been more difficult to implement.

Tennis, for example, which seems a perfect candidate for the statistical nerds to get hold of (after all, it’s dripping with player performance metrics) has long frustrated fans, as much of the data collected by the sports’ governing bodies has not been made publicly available. Instead, the task of recording the game’s metrical minutiae has been left to networks of fans, collaborating in online match-charting projects.

Meaningful Data

In recent years, these efforts have produced enough meaningful data to allow those involved in the sport – whether players, pundits, or punters – to better understand the drivers of performance. We now know, for example, that a player’s second-serve, alongside their ability to return their opponent’s second-serve, is far more significant in determining the outcome of matches than all the other indicators – such as aces, first-serve percentage, first-serve return percentage – that were long obsessed over.

Whereas the challenge in having a more data-informed understanding of tennis has been in releasing known, but incarcerated, metrics, the problem in other sports has been in understanding which metrics to record in the first place. Tennis and baseball are relatively similar in that they involve short bursts of action, interspersed with breaks, and both games are easy to break down into their constituent parts: because the actions of opponents are largely separate and distinguishable.

In baseball, for example, a pitcher hurls a ball at a batter, who either hits it or doesn’t, and the ball is picked up by a fielder, who then throws it to one of a narrow range of positions on the field, where someone catches it or doesn’t.

That’s not to say there aren’t intricacies and complications when it comes to tracking certain parts of the games of tennis and baseball – it’s difficult to accurately record the effects of pitching technique and shot spin, for example – but, for the most part, the games are relatively easy to segment when compared to other sports.

Data in Rugby

Take rugby union as a comparison, for example. Whereas most of a tennis match is spent preparing to play a point (only around 15% of the time taken for the average tennis match is actual play), the action in rugby is much more constant (with active play time reaching around 45% in recent years).

Further, it is much harder to clearly distinguish and segment different aspects of a game: rucks may share certain characteristics that make them rucks according to the rules of the game, but one containing ten players is different to one with three, and one on the half-way line is different to one near the try line.

There are other complications, too. In rugby, one phase of play quickly moves into another, and it is difficult to measure the effect one has on another. A try may appear to be the result of a blistering burst of speed from a full back, but it could just as easily have more to do with the cumulative effect of a series of rucks and mauls minutes earlier, which had systemically drew defenders out of position.

Despite these challenges, though – multiple players, in relative states of action and inaction, constantly shifting in a game with a spectroscopic range of variables – the advent of improved technology, especially filming and GPS tracking, has seen sports analysts begin to develop a variety of metrics that inform their understanding of the sport: allowing them to understand the tactical choices that will bring the most points, or concede the least.

Times are changing

The frustration for rugby punters and fans, though, has been getting hold of this data. Whilst we’ve known for a long time that the best rugby operations in the world have been able to break the complex and seemingly chaotic game into increasingly measurable elements, much of this work is clandestine, jealously guarded by professional outlets aware of the competitive advantages it brings.

Increasingly, however, it’s making its way into the public sphere. Prior to the professionalisation of the game in 1995, it was hard enough to unearth the most rudimentary of information prior to a match: if you could uncover the weight of the packs, you were doing well. Now, though, not only are player heights and weights widely known, but so are more microscopic metrics such as offloads made and missed tackles.  It’s an increasingly data-rich and data-informed sport.

And there has been a commercial response to this, with several websites offering customers a variety of performance metrics that (they claim) give a greater insight into the game than has been possible before.  Individual players are now rated in tables, showing how they match up according to key performance metrics, with these micro-measures then combined into overall ratings, which allow customers to see – in a seemingly unbiased, quantitative way – who the best players are.

Understanding the Data

To add lustre to what they offer, some companies even claim that their metrics have been developed using the most advanced forms of statistical analysis borrowed from the financial world, and that machine-learning protocols have been used to analyse the output.

As an ardent data wonk, I’ve tracked the development of these statistical measures closely (no self-respecting rugby punter could ignore them, could they?) and explored how a would-be profiteer might use them to inform better punting decisions. The exploration has not been especially fruitful.

In any analysis, it’s important to spend time assessing the information you have, and how suitable it is to your purpose. The data these companies provide – whilst interesting to fans, no doubt, enlightening arguments about who the best players might be; and to coaches, too, who could use it to inform scouting decisions and track the development of their players – is not especially useful to punters, with it falling down on two fronts.

First, the data doesn’t measure what is important to rugby punters. When placing bets, we typically want to know about the relative likelihood of one team or another winning. We can then compare these calculated likelihoods to the odds available in the markets, securing value bets (where the odds we take are bigger than they should be). To do this effectively, we need metrics that highlight the relative strengths of whole teams, not the more microscopic phase- and player-level measurements offered by these companies.

An obvious response to this criticism is to ask why these exciting, new, machine-learned metrics cannot just be aggregated into whole-team data. Well, they can, but understanding how the individual ratings are derived initially highlights why aggregating them is not wise.

Finding something insightful

Player-level data is driven by regression analysis, which involves assessing numerous data points – number of tackles, lineouts won, etcetera – and understanding which of them has the closest relationship to another variable, in this case a team winning or losing. Find the metrics that are most associated with winning, and hey presto: you have the aspects of play that are most important to measure at an individual player level.

This process highlights the second reason these new forms of data are not especially useful to punters, though. When using regression analysis to identify the more microscopic elements of rugby play that contribute to winning, these companies rightfully choose to focus on those variables that have the biggest effect: in other words they leave out other variables that, whilst contributing to team success or failure, are not significant enough to warrant measuring. The ratings that result are, therefore, undoubtedly insightful, but like all modelled data, represent a simplification of reality: leaving out what is too difficult to compute.

If we then take those simplified (and therefore partially inaccurate) player-level metrics and aggregate them, we double the effect of that initial simplification, leaving us with data which feels insightful because of all the wizardry that has contributed to its creation, but is, in fact, less accurate than more straightforward forms of analysis that had only focused on team-level analysis in the first place.

Priortising your Data

This conundrum – which sees more complex and intricate approaches yielding less accurate results – is known in academia as Occam’s Razor: but is more commonly understood by the partially misattributed wisdom of Einstein: “Everything should be made as simple as possible, but no simpler.”

In assessing the value of these newly-available rugby union metrics, I’ve spent the last year trying to use them alongside my own, low-tech Elo ratings, but haven’t yet found a way where they can be used to outperform what I already use, or where they can become an added layer of analysis to improve forecasts.

Much of the struggle has revolved around the difficulties in combining the player ratings into any kind of implied percentage chance of victory and, whilst this may become easier as we have more years of data to play with, my initial analyses into how possible this will be have been inauspicious.

That’s not to say that the new metrics are useless for punters. Others might be able to find ways that the information can supplement their existing models, and it’s certainly possible that they could help punters with in-play betting decisions. Understanding the relative merits of a handful of players contesting a line-out near the try-line, for example, might help inform a decision about whether a score is likely.

It’s also possible that these new ratings could act as a sense-check in situations where team line-ups have changed significantly. One obvious criticism of Elo ratings and their ilk is that they are predicated on past team performance. If the teams are no longer the same, then how reliable are the ratings?

It’s a valid question, and having quantifiable measures of the relative quality of individual players could allow punters to assess whether the rating they are relying on is likely to be higher or lower for the team that has experienced significant roster changes.


Whilst I will continue to watch the development of such data in rugby union with a keen interest, then, it doesn’t look like I’ll be incorporating it into my betting models any time soon. The reasons for this reluctance highlight four points that all punters should remember when analysing data:

  1. We likely have a bias for data that seems more intricate and complex, naturally believing that its intricacy and complexity is a guarantee of its worth.  We need to beware this bias in the same way that investors needed to beware the US sub-prime mortgage market.
  2. We need to assess whether the data measures what is useful to us.  Whilst it might be interesting to know who the best second-row is in world rugby, this information doesn’t tell us much about the likelihood of his team winning.
  3. We need to test any new data model.  It might have passed the sense-checks suggested above, but would it have led to profit if using the model on past matches, and is it successful when paper-trading the model for a period of future matches?
  4. We need to be prepared to “kill our babies”.  It’s a horrible phrase – borrowed from the world of journalism, where it refers to the requirement of writers to be able to edit-out writing that they have become particularly attached to – but it perfectly captures the essential mindset of the data-savvy punter.  Having invested precious time into a new data model, we can become emotionally committed, finding it hard to let it go, but we must recognise that not all our projects will lead to success.

Related Articles

Challenging Our Punting Assumptions

Jack Houghton delves into why must challenge our pre-conceived punting beliefs to become a long term, profitable punter.

Are you a better than an average punter?

You have agreed to take part in a questionnaire.  Please answer the following questions.

The Dangers of Inside Information

Jack Houghton takes an in depth look at the danger of following inside information in the latest addition to ...