Do you want to enter the World Cup Datathon but not sure where to start?

Our Data Science team are sharing some resources to give you a head start.

Using the popular ELO method, they’ve created a set of ratings based on previous World Cups.

In THIS ATTACHMENT you’ll see two zipped files (download / unzip / open with R).

The simple ELO file uses international matches from the 2010 World Cup, through to 2014. It creates a set of team ratings, based on those results, and plots them against the 2014 World Cup.

Following on from that, it uses the same method to ELO rank teams before the 2018 World Cup. The MakeIELO model will be a live submission that you’ll see on the 2018 Datathon leaderboard. We recommend you take a look at each R file and see how you can customise them for your own submissions.

For a better understanding of how the Data Science team have created the ELO ratings, please continue reading.


  • The elo model is continually adjusting a team’s rating
  • Before a team plays any match they start with a score of 1500
  • After each match a team plays their score is updated by the result (win or loss) and the strength of the team they were playing
  • In essence you’re penalized more for losing to bad teams and rewarded more for beating good teams (and vice versa)
  • Thus each team maintains a continuous rating which is a proxy for how good the team is
  • Implicitly in the rating system is probability of victory of a team with ELO score X vs a team with score Y

Our ELO Approach

  • We set every soccer team’s score to 1500 on the first day after the 2010 WC
  • We run each game between then and the start of the 2014 WC through this scoring system
  • At the end we have a set of ratings for every team including both those the qualified and didn’t qualify for the world cup
  • Then when we look at the schedule and we have say Argentina (who might have landed at a score of 1810) vsing Saudi Arabia (who might have landed at an ELO score of 1630) we can calculate Argentina’s probability of victory


  • We needed to overcome the problem that our probability formula for two ELO scores doesn’t naturally support drawed results.
  • It will say Argentina will win 72% of the time and Saudi Arabia 28% of the time
  • But we know that a draw is more than possible!
  • So we calculated the historical draw rates for this scenario
  • To be more general we said for teams with a ELO point differential of around 200 (1810 v 1630) these matches drew 22% of the time
  • Then we just adjusted the win / loss probabilities equally (11 % each) to allocate probabilities to all possible outcomes


Hopefully you’ve gained some confidence from the article and the two R attachments that Betfair’s Data Science team have shared.

We’ve got one other ELO educational tool on The Hub which may help your understanding: ELO Modeling for Tennis.

Related Articles

How to use freely-available data to profit

Jack Houghton explains how to use freely-available data to make a profit this World Cup.

2018 FIFA World Cup – Outright Betting Preview

Football Formlabs goes through a detailed preview of the World Cup and makes selections for the Outright Winner

Soccer Betting: How to use Ratings

As a passionate sports’ fan and punter, Jack has written about sports and betting for over a decade, winning ...