Wednesday 7 August 2013

The problem with predicting football results - you cannot rely on the data


Bloomberg Sports have published their predictions for the forthcoming Premiership season (****see update below for actual results) in the form of the predicted end of season table. Here are some key snippets from their press release:
The table indicates that this season will be a three horse race between Chelsea, Manchester City and Manchester United .... The Bloomberg Sports forecast expects Arsenal to claim the final Champions League place ahead of North London rivals Tottenham Hotspur.... At the bottom of the table, all three newly promoted teams are expected to face the drop...
There is just one problem with this set of 'predictions'. The final table - with very minor adjustments - essentially replicates last season's final positions.  The top seven remain the same (with the only positional changes being Chelsea and Man Utd switch positions 1 and 3, and Liverpool and Everton switch positions 6 and 7). And the bottom three are the three promoted teams so they also 'retain' their positions.

Bloomberg say they are using "mathematically-derived predictions" using "vast amounts of objective data". But herein lies the problem. As we argue in our book, relying on data alone is the classical statistical  approach to this kind of prediction. And classical statistics is great at 'predicting the past'. The problem is that we actually want to predict the future not the past!

Along with my PhD student Anthony Constantinou we have been applying Bayesian networks and related methods to the problem of football prediction for a number of years. The great thing about Bayesian networks is that they enable you to combine the standard statistical data (most obviously historical and recent match results) with subjective factors. And it is the incorporation of the subjective (expert) factors that is the key to improved prediction that 'classical' statisticians just do not seem to get.
 
This combination of data and expert judgement has enabled us to get more accurate predictions then any other published system and has even enabled us to 'beat the bookies' consistently (based on a simple betting strategy) despite the bookies' built-in profit margin. Unlike Bloomberg (and others) we have made our methods, models and results very public (a list of published papers in scholarly journals is below). In fact for the last two years Anthony has posted the predictions for all matches the day before they take place on his website pi-football. The prediction for each match is summarised as a very simple set of probabilities, namely the probability of a home win, draw and away win. Good betting opportunities occur when one of the probabilities is significantly higher than the the equivalent probability from the bookies odds.
Example: Suppose Liverpool are playing at home to Stoke. Because of the historical data the bookies would regard Liverpool as strong favourites. They would typically rate the chances of Stoke winning to be very low - say 10% (which in 'odds terms equates to '9 to 1 against'). They add their 'mark-up' and publish odds of, say, 8 to 1 against a Stoke win (which in probability terms is 1/9 or 11%). But suppose there are specific factors that lead our model to predict that the probability of a Stoke win is 20%. Then the model is saying that the bookmakers odds - even given their mark-up - have significantly underestimated the probability of a Stoke win. Although our model still only gives Stoke a 20% chance of winning it is worth placing a bet. Imagine 10 match scenarios like this. If our predictions are correct then you will win on 2 of the 10 occasions. Assuming you bet £1 each time you will end up spending £10 and getting £18 back - a very healthy 80% profit margin.
Thanks to Alex on the Spurs-list for the tip-off on the Bloomberg report.

****Update: The actual results for the 2013-14 season were very different from the Bloomberg predictions. The title was a two-horse race between Man City and Liverpool with the rest far behind. Liverpool had been predicted to come 6th and would have won the title but for a late collapse. Man Utd finished 7th. Only one of the newly promoted club (Cardiff) was relegated.

References:
  • Constantinou, A., N. E. Fenton and M. Neil (2013) "Profiting from an Inefficient Association Football Gambling Market: Prediction, Risk and Uncertainty Using Bayesian Networks". Knowledge-Based Systems. http://dx.doi.org/10.1016/j.knosys.2013.05.008
  • Constantinou, A. C. and N. E. Fenton (2013). "Determining the level of ability of football teams by dynamic ratings based on the relative discrepancies in scores between adversaries." Journal of Quantitative Analysis in Sports 9(1): 37-50. http://dx.doi.org/10.1515/jqas-2012-0036
  • Constantinou, A., N. E. Fenton and M. Neil (2012). ""pi-football: A Bayesian network model for forecasting Association Football match outcomes." Knowledge Based Systems, 36, 322-339,  http://dx.doi.org/10.1016/j.knosys.2012.07.008
  • Constantinou, A. , Fenton, N.E., "Solving the problem of inadequate scoring rules for assessing probabilistic football forecasting models", Journal of Quantitative Analysis in Sports, Vol. 8 (1), Article 1, 2012. http://dx.doi.org/10.1515/1559-0410.1418