I have recently been reading quite a lot about the statistical analysis of football. Two sites I’ve found particularly interesting are:
Here they introduce two interesting concepts that I thought I’d try to apply to the World Cup group stage results. I will attempt to describe them briefly below but they are explained better here.
TSR = total shot ratio = shots taken / (shots taken + shots conceded)
TSR is the fraction of total shots that a team takes. A high score is a sign of a strong team, a low score shows a weaker one. Here I have represented TSR as a percentage, so the mean value is 50.
TSR has been shown to give a good indication of how dominant a team is and to be a reasonable predictor of future performance. When looking at a small sample of games – for example the three games each team plays in the first round of the World Cup – it can give a better idea of the strength of a side than actual goals or points.
This is because, in football, goals are relatively unlikely events – even in this World Cup – so over a small sample size they are considerably affected by luck. Shots are far more frequent and therefore give a better idea of the flow of the game. Of course, it’s still a fairly rough approximation, as not all shots are created equal. TSR takes no account of location, whether the shot was on target, or what position the defence and goalkeeper were in, but it’s still a good approximation.
PDO = (shot percentage + save percentage)
Unlike TSR, these percentages are calculated from shots on target. A high PDO shows a luckier team, a low value shows an unfortunate one. The mean is 100.
This statistic gives a good indication of how lucky a team has been. While the quality of the goalkeeper and strikers may affect a team’s score, PDO has been shown to be largely down to luck – over time it regresses to the mean.
So from these two statistics we should get a good idea of how well teams actually performed in the group stage and how much luck they got. It may also give us an idea of future performance in the knockout rounds, however the very small sample size and the varied strengths of the groups make this debatable.
Note: The raw data I used to perform these calculations is taken from www.whoscored.com.
This table makes good reading for Brazil fans. The hosts clearly deserved to go through and dominated the group. Mexico look fortunate to have progressed, at the expense of Croatia and Cameroon. All three had a very similar TSR but Mexico had a very high PDO.
The Cameroon figures demonstrate some of the pitfalls in using these statistics. The Africans look to have been pretty unlucky and to have slightly outperformed Mexico, which may puzzle many of those who watched their games.
On closer inspection, it turns out that their percentage of shots on target is particularly poor – only 9% compared to Mexico’s 30%, Croatia’s 38% and Brazil’s excellent 49%. It is likely Cameroon’s TSR was boosted by lots of speculative efforts.
As expected, the statistics show Holland dominated, although they were maybe a little lucky to have scored quite so many goals.
More interestingly, it looks like Spain also performed well and were particularly unlucky, while Chile look to have been very fortunate. Maybe the popular narrative of Spain’s fall and Chile’s rise has been slightly exaggerated.
Before the tournament, this looked one of the most evenly matched groups. Looking at TSR, the final table appears to be upside down. Japan and particularly Ivory Coast were unlucky to be eliminated – it did take a last minute Greek penalty to dispatch Africa’s perennial underachievers.
Colombia won all three games and are many people’s dark horses for the tournament. On this evidence, I won’t be betting on them.
As an England fan, this makes frustrating reading. The statistics suggest England were the dominant team in the group and were very unlucky. Their numbers are slightly skewed by a high percentage of shots from outside the box (63%) and a dead rubber against group winner Costa Rica.
Even with this in mind, they have a very impressive TSR – the second highest in the whole competition – and a very low PDO – the lowest of any country. The perception among England fans of some good performances despite the poor results, appears to be correct.
Finally a group where the clearly dominant teams won through. France have the highest TSR in the competition, by some distance, and look a good bet to take home the trophy. One caveat is that, apart from the French, this looks a relatively weak group.
Argentina deservedly top group F, but Bosnia look unlucky to be going home. This agrees with the common perception of the way the games unfolded. The Bosnian’s were unfortunate to get nothing from their tie with Argentina and got a decidedly dodgy offside call against Nigeria.
The USA look extremely fortunate to be in the second round. They have the lowest TSR of any team, including all those who have been eliminated. Germany, my tip for the tournament, weren’t as dominant as you might expect and appear to have had a reasonable amount of luck too.
Portugal and especially Ghana can count themselves unfortunate.
Belgium are another group leader who are also top on TSR. Russia went out despite a much higher score than Algeria. In the end, the battle between those two for qualification came down to the wire. Perhaps if Capello had been a little more positive, his team would have made more of an impression on the tournament.
In a relatively short tournament, luck plays a large part in the results. Any statistic must be taken with a pinch of salt but it certainly seems that some teams have been more fortunate than others.
When looking at TSR across the whole competition, it is striking how many of the teams with higher scores have gone out. There may be more going on here than is reflected in the statistics but it does imply a high luck component in the final standings. England, Ivory Coast, Ghana, Spain and Russia all feature near the top, whilst the USA are bottom. The much favoured Chile and Colombia also appear relatively far down the list.
The nature of the tournament may have led to some in game score effects. Teams that are losing or going out, have to take the initiative and attack more, whilst sides going through can afford to defend.
The usefulness of group game results in predicting the eventual winner is debatable. The sample size, of three matches, is very small. The groups are independent and vary in quality. In addition, international teams get very little time to play together and so often improve markedly as the tournament progresses.
Still, France look a good bet for World Cup glory. Holland, Belgium and Argentina all seem reasonable second choices, while hosts Brazil are not too far behind.
Do you think these statistics tell us much? Should France be favourites? Please let us know your opinion in the comments section below.