Monday 6 March 2017

Explaining and predicting football team performance over an entire season


When I was presenting the BBC documentary Climate Changes by Numbers and had to explain the idea of a statistical 'attribution study', I used the analogy of determining which factors most affected the performance of Premiership football teams year on year. Because I had to do it in a hurry I and my colleague Dr Anthony Constantinou did a very crude analysis which focused on a very small number of factors and showed, unsurprisingly, that turnover (i.e. mainly spend on transfer and wages) had the most impact of these. 

We weren't happy with the quality of the study and decided to undertake a much more comprehensive analysis as part of the BAYES-KNOWLEDGE project. This project is all about improved decision-making and risk assessment using a probabilistic technique called Bayesian Networks. In particular, the main objective of the project is to produce useful/accurate predictions and assessments in situations where there is not a lot of data available. In such situations the current fad of 'big data' methods using machine learning techniques do not work; instead we use 'smart-data' -  a method that combines the limited data available with expert causal knowledge and real-world ‘facts’. The idea of predicting Premiership teams' long term performance and identifying the key factors explaining changes was a perfect opportunity to both develop and validate the BAYES-KNOWLEDGE method, especially as we had previously done extensive work in predicting individual premiership match results (see links at bottom).

The results of the study have now been published in one of the premier international AI journals Knowledge Based Systems.

The Bayesian Network model in the paper enables us to predict, before a season starts, the total league points a team is expected to accumulate throughout the season (each team plays 38 games in a season with three points per win and one per draw). The model results compare very favourably against a number of other relevant and different types of models, including some which use far more data. As hoped for the results also provide a novel and comprehensive attribution study of the factors most affecting performance (measured in terms of impact on actual points gained/lost per season). For example, although unsurprisingly, the largest improvements in performance result from massive increases in spending on new players (an 8.49 points gain), an even greater decrease (up to 16.52 points) results from involvement in the European competitions (especially the Europa League) for teams that have previous little experience in such competitions. Also, something  that was very surprising and that possibly confounds bookies - and gives punters good potential for exploiting -  is that promoted teams generate (on average) a staggering increase in performance of 8.34 points, relative to the relegated team they are replacing. The results in the study also partly address/explain the widely accepted 'favourite-longshot bias' observed in bookies odds.

The full reference citation is:
Constantinou, A. C. and Fenton, N. (2017). Towards Smart-Data: Improving predictive accuracy in long-term football team performance. Knowledge-Based Systems, In Press, 2017, http://dx.doi.org/10.1016/j.knosys.2017.03.005
The pre-print version of the paper (pdf) can be found at http://constantinou.info/downloads/papers/smartDataFootball.pdf

We acknowledge the financial support by the European Research Council (ERC) for funding research project, ERC-2013-AdG339182-BAYES_KNOWLEDGE, and Agena Ltd for software support.

See also: