Clean Sheets, Corners and Cards — What’s More Important?

Soccer By the Numbers has a great post on the value of corners which raised an interesting point about the importance of different statistical measures.  One of the problems with trying to build regression models for soccer is that few of the variables are independent.  For example, if you want to look at the relationship corners and shots have on wins, it gets a bit tricky.  It’s likely a corner was awarded because of deflected shot or goalkeeper tip and it’s also likely that the corner itself will lead to a shot.  As the number of shots go up, you can expect the number of corners to go up.  As the number of corners go up you can expect the number of shots to go up.  It’s a mess.  How can you determine the effect of corners and shots on a team’s success when they are so intertwined?

If you build a regression model, you’ll arrive at a coefficient for each feature in your model and you can look at the incremental effect of each feature.  Using the previous 5 years of data from the EPL, I built a linear regression model to predict the number of points a team earns in a season based on Offensive and Defensive Production.

Points = 64.39+0.06095*Shots+26.16*ConversionRate-0.0797*ShotsConceded-29.73*Opponent’sConversionRate

The R-Squared for this model is 0.9469.  Incredible, but it begs some questions.  What is more important, offense or defense? Creating chances of finishing chances?  You can look at the coefficients of the features and sort of say that the defensive coefficients are higher than offensive ones so maybe defense wins more games, but what about chances versus finishing?

There is a technique called LMG (named for Lindeman, Merenda and Gold) that quantifies each feature’s relative importance in a linear model.
Relative Importance of Offensive and Defensive Production
Using LMG on the model you see the features are similarly important, with defensive features slightly more important than offensive (whether or not it’s a significant difference is another story). Fair enough, but there are lots of factors that contribute to shots and goals, so the next question is, can we create a model that includes some of these and what will that tell us?

I created a kitchen sink model with a handful of features that intuitively I thought would impact a team’s success (remember we are looking at points earned over a season and not the results of individual matches). I included (both for the team and it’s opponents):
Clean Sheets
Shots
Goals
Corners
Yellow Cards
Red Cards

The R-squared of the model is 0.9531 and all of the coefficients of the features had the expected directionality (positive for the team in question and negative for their opponent, meaning taking a shot is good and conceded a shot is bad) for models with only one feature.

Goals scored and conceded are the most important features in the model which makes sense, but what was surprising is that clean sheets are almost as important as goals scored. When you think about it, though, it isn’t as surprising.  A clean sheet means a team is guaranteed at least a point and being held to a clean sheet means a team can earn at most a point.  Clean sheets explain 13.35% of a team’s points earned in a season.  Corners? They are almost as important as shots. Soccer By The Numbers looked at the number of goals scored from corners (not a lot) so I was surprised to see corners with such a high relative importance. It might be that there is a missing feature from the model that better explains points earned in a season and is related to corners.  If this missing feature were to be added, LMG would decompose the relationship accordingly and corners would have a lower relative importance.  Cards?  Cards are practically insignificant. Yellow cards’ impact on matches is fairly small.  You could argue that perhaps a defender on a yellow card is more cautious but the number of events that are altered because of that is pretty small.  Red cards are extremely infrequent so while their impact on a single match is high, their impact on the entire season is insignifcant.

2 comments

  1. kczat says:

    I’m intrigued by this result on corners.

    I’ve recently seen statistics (just like what you mention) showing corners to not be very important. But conventional wisdom holds corners to be important. And certainly the example of Arsenal conceding so many goals to corners comes readily to mind.

    My initial thought was that the data might count fewer things as “corners” than people watching do. But the results here actually do show corners as being important, so maybe that’s not what’s going on….

    It would be great to understand this better.

  2. [...] Clean Sheets, Corners and Cards — What’s More Important?: Sarah Rudd of On Football creates a “kitchen sink model” to determine the relative importance of various factors on team success. [...]