Archive for Statistics

Villarreal 2011-12 – Breaking down a failed season

Villarreal had a disastrous 2011-12 ending in an agonizing relegation to the 2nd division in the dying minutes of the season. In 2010-11 Villarreal reached the semifinals of the Europa League and finished 4th in La Liga to qualify for the Champions League. I focus on the performance of Villarreal in the attacking third of the pitch in 2011 and compare it with that of 2010.

Objectives

  • Compare performance in the attacking third in 2011-12 and 2010-11.
  • Identify the possible causes for the decline in performance.

Methodology

  • Divided the final third into 6 zones based on histogram of passes.
  • Visualize and compare the passing in the final third of the pitch in 2011 & 2010 seasons, by zone, position and individual players.

Data

  • In-game event data of all La Liga games of Villarreal in seasons  2010 and 2011 from OptaPro

Tools & techniques

  • Hexagonal binning using R.
  • Tableau Public

Assumptions

  • Excluded the short passes of the short-corners near the corner flag area to get a granular picture of passing in all of the final third.

Analysis

Figure – 1:Passing summary in the final third 2010-11 vs. 2011-12


The above diagram visualizes Villarreal’s passing in the final third in 2010 & 2011 seasons.

  • The size of the circle is based on the # of pass attempts made in that zone.
  • The number in the middle of the circle is the # of pass attempts made in that zone.
  • The color gradient is based on pass completion % in that zone.

Findings

  • Passing completion % in the final third went down from 72.6% in 2010 to 69.6% in 2011.
  • The pass completion % is significantly down in all the zones of the final third (except Z1).
  • The penalty box(18y-box), the left wing (Z5) and the central zone (Z2) have seen the biggest drop in passing completion % (8.1%,  5.6% and 5.1% respectively)
  • 21% – Drop in passes attempted in the central zone Z2.
  • 7% – decline in pass completion in the central zone Z2.
  • 22% – increase in the passes attempted in Z1.

Figure – 2: Hexbin of all completed passes in the final third of the 2010 and 2011 seasons.

Darker cells indicate more completed passes represented by the area of the cell. “Counts” legend gives the # of completed passes represented by each shade in the gradient.

Findings:

  • Villarreal attack appears more balanced in 2010 than in 2011.
  • There is a huge 21% drop in # of passes attempted in the central zone Z2.
  • The hypothesis is a combination of the following
    1. Lack of penetration through the middle forcing the midfield players to pass it sideways early in the attack.
    2. No good alternate options to fill the gaps left by Cazorla (sold to Malaga) and Rossi (out injured for 4/5ths of the season).

Figure – 3:2010 vs. 2011 difference plot

How to read an erosion difference plot?

  • In the erosion difference plot
    1. The green cells indicate areas with similar amounts of passes in 2010 & 2011
    2. The red cells indicate areas where there were more passes in 2010 but not in 2011
    3. The cyan cells indicate areas where there were more passes in 2011 but not in 2010
    4. The white cells indicate the median position of passes.
    5. The arrow indicates the shift of the median from 2010 season to 2011 season.

Findings:

  • The red cells in the middle indicate the decline in completed passes through the middle in 2011.
  • The cyan cells on the right and left indicates an increase in completed passes on the wings.

The rest of the post breaks down these numbers by positions and players.

Midfield

I compared the top 4 starters in the midfield (by minutes played) in 2010 to 2011.

2010 – Borja Valero, Bruno Soriano, Santi Cazorla and Cani

2011 – Borja Valero, Bruno Soriano, Marcos Senna and Cani

Figure – 4:2010 vs. 2011 Midfield

Findings

  • Villarreal midfielders had trouble completing passes in the central zone Z2 in 2011
  • It seems like the midfielders were forced to pass sideways early in the attacks to Z1 & Z3.

Figure – 5:Difference Erosion plot– Midfield

Findings

  • This plot reinforces what we saw in Figure – 4. Red cells =Passing through the middle suffered.
  • The overall median of completed passes shifted from left to right as indicated by the arrow.

Now let us distill into the data of the players that make up the midfield.

Santi Cazorla

Figure – 6:Santi Cazorla’s 2010 passing in the final third.

Findings:

  • The two-footed Cazorla had a strong influence in the center (Z2) in 2010.
  • His absence on the field was felt in 2011. Villarreal’s passing through the middle suffered in quantity (down by 21%) and completion % (down by 7%)

Borja Valero

Figure – 7:Completed passes in the final third.2010 vs. 2011 – Borja Valero

Findings:

  • Borja Valero’s passing was more balanced across Z1, Z2& Z3 in 2010 compared to 2011.
  • In 2011 Borja Valero’s has been more active on in Z1 & Z3 and less active in the central Z2.
    1. This indicates a lack of penetration through the middle. Opponents seem to have forced to Borja to pass to the right or left as soon as he got the ball in the final third.
    2. Borja might not have found outlets early enough to pass the ball through the center and was probablyforced to pass it sideways to keep the possession.

Figure – 8:Difference erosion plot – Borja Valero

Findings:

  • The plot highlights earlier findings about increased passing in 2011 in on the right & left (Z1 & Z3) and decreased passing in Z2 compared to 2010.
  • The plot shows that median of Borja passes have shifted to the right.

Bruno Soriano

Figure – 10:Completed passes in the final third.2010 vs. 2011 – Bruno Soriano

Findings:

  • Bruno’s zone of influence seems be the left midfield
  • He had more influence in the final third in 2011 compared to 2010.
  • Bruno has been a lot more adventurous in 2011. The # of hexagons in and around the 18y-box is higher in 2011 than 2010. Bruno scored his first career goal of La Liga and 3 goals in total during the 2011 season.

Marcos Senna
Figure – 11: Completed passes in the final third.2010 vs. 2011 – Marcos Senna

Findings:

  • Marcos Senna’s was injured a lot in 2010 and didn’t play much.
  • As a right central midfielder in a double-pivot, his influence is on the center-right side of the pitch.
  • Along with Bruno he has been the bedrock of this shaky and inconsistent Villarreal side

Cani

Figure – 12:Completed passes in the final third.2010 vs. 2011 – Cani

Findings:

  • The plot shows Cani’s influence is predominantly on the left side.
  • His passing through the middle Z2 and the right zone Z3 seems to have suffered in 2011.
  • Dribbling and running at the opposition defenders is a key aspect of Cani’s game. To that effect his interventions near the 18y-box seem to have reduced in 2011.

Figure – 13:  Cani difference Erosion plot

Findings:

  • Median of Cani’s completed passes has moved slightly left and backwards (away from the opponents goal)
  • The red cells closer to 18y-box imply that his influence in the vicinity of the 18y-box has decreased in 2011.
    1. This points to lack of creativity and penetration through the middle.
    2. Cani is probably forced to dribble from wide positions too early in the attack, making it easier for defenders to defend him.

Forwards

2010 – Nilmar, Rossi, Ruben

2011 – Ruben, Nilmar, Martinuccio, Joselu, Rossi

Figure – 15:Completed passes in the final third.2010 vs. 2011 – Forwards

Findings

  • More completed passes by forwards in the final third in 2010 vs. 2011, especially through the middle.
  • Villarreal forwards didn’t get much service through the middle in 2011

Figure – 16: Forwards – Difference erosion plot

Findings

  • The median of completed passes for forwards shifted backwards (away from the opponents goal) by about 5 meters
    1. This is could be a pointer to something deeper like lack of penetration or creativity in the final third, forcing the forward to come deeper to receive the ball.

Giuseppe Rossi

Figure 17 – Completed passes in the final third.2010 vs. 2011 – Giuseppe Rossi

Findings:

  • Rossi plays across all zones in the final third and especially strong in the central zone Z2.
  • In 2011 Rossi suffered a season-ending cruciate ligament injury in week 8 of La Liga.
  • Rossi’s absence has been felt in central zone of the final third in 2011. Villarreal’s passing through the middle suffered in quantity (down by 21%) and completion % (down by 7%).

Figure – 18:Rossi Difference erosion plot

Findings:

The eroded difference plot gives an idea into shifts in positioning of Rossi from 2010 to 2011. Please note that we are comparing a relatively smaller dataset of 2011 (8 games) to 35 games in 2010.

  • The median of Rossi’s passes has moved about 5 meters backwards (away from the opposition goal.
    1. This is a pointer to something deeper like lack of penetration or creativity in the final third, forcing Rossi to come deeper to receive the ball.

Nilmar

Figure – 19: Completed passes in the final third.2010 vs. 2011 – Nilmar

Findings:

  • Nilmar missed a lot 2011 season through injury or through coach’s decision not to play him due to the rumors around his transfer in January.
  • When he played, he wasn’t effective.
  • The few # of completed passes in 2011 could be due to
    1. Lack of service
    2. Villarreal played in a single striker formation in 25 of the 38 games

Figure – 20:Nilmar difference and erosion plot

Findings:

  • Nilmar’s median of passing shifted backwards (away from the opponents goal) and towards the center.
    1. This indicates lack of supply to Nilmar in advanced positions forcing the forward to come deeper to receive service.

Marco Ruben

Figure – 21:Completed passes in the final third.2010 vs. 2011 – Marco Ruben

Findings:

  • Ruben wasn’t a regular starter in 2010. This explains partly, the bigger influence of Ruben in 2011 compared to 2010.
  • Villarreal played with a lone striker in 2011 a lot more times (27 of 38 games) than in 2010.
  • The dark hexagons in Z3 (right) Z2 (middle) could be areas where he came deep to receive the long passes.

Figure – 21:Ruben difference and erosion plot

Findings:

  • The erosion difference plot shows that the median of Ruben’s passes moved about 7-8 meters deeper (away from the opponent’s goal) and shifted to the right from a more central position
    1. This implies Ruben had to come deeper to receive the ball.
    2. Moving away from the center also implies lack of service when Ruben was in advanced positions.

Wingbacks

Figure – 23:Completed passes in the final third.2010 vs. 2011 – Right backs

Findings:

  • More attacking from the right wing back position in 2011 compared to 2010, especially in Zone 3.

Figure – 24:Completed passes in the final third.2010 vs. 2011 – Left backs

Findings:

  • The plot shows more attacking from the left back position in 2011 than 2010.
  • Joan Oriol featured a lot at LB in 2011 who tends to go forward often.

 

Summary of Findings

  • Final third passing data indicates that Villarreal’s attacking in the final third shifted from the center to right and left wings in 2011.
  • Data indicates that the absence of Santi Cazorla and Giuseppe Rossi contributed to the weakness of Villarreal through the middle.
  • Borja Valero’s overall influence increased in 2011. However his passing through the middle has declined in 2011 while his passing increased on right and left wings.
    1. This is probably due to Villarreal’s attacks being pushed out wide early (and quite far away from the 18y-box) reducing their effectiveness.
  • The median of passing for Rossi, Cani, Marco Ruben and Nilmar has moved backwards (away from the opponent’s goal)
    1. This could be a sign of forwards being starved of service forcing them to come deeper to get the back.
    2. The passing median also shifted right or left for all midfielders and forwards further reinforcing the premise of lack of penetration through the middle.
  • The wingbacks seem to have supported the attack better in 2011. But the fact that the ball has been pushed wide early meant that the opponents were able to defend Villarreal’s attacks with greater success and ease.
  • Lack of penetration through the middle could also be due to slow build-up of attacks because of the lack of outlets upfront.
    1. The team missed Rossi’s runs off the ball to create space to receive the through ball.
    2. Villarreal also used 1-striker formations a lot of times in 2011 (25 out of 38 games) as opposed to their more common 4-4-2 formation due to a variety of personnel issues as well as tactical (3 different coaches in one season)

 

Visualizing Completed Passes by Position

I’m always on the lookout for new ways to visualize data in the hopes that it might lead to a better understanding of the data.  In the first leg of the tie between Real Salt Lake and Seattle Sounders FC, the Sounders midfield was completely MIA for large portions of the game while RSL enjoyed large periods of maintaining possession.  I wanted to come up with a generic way to visualize similar situations.  I decided to use a stacked time series, broken down by position.  In the examples below I looked at completed passes by position.  Any metric could be used and you could also use different variables to slice the data.  Another thing to look at could be which third of the pitch the event occurs in.  I like the idea of the stacked time series because it allows you to look at the team total as well as some finer detail at the same time.
Read more

Goal Glut in the Premier League?

There has been lots of talk about the goal glut that is happening in the Premier League right now.  Are pricey strikers to blame or is it the death of quality defense?  Decision Technology’s Ian Graham has already taken a look at debunking the Guardian’s piece on the “goal glut”.  I thought I’d add my two cents.

Read more

Statistical Breakdown of Real Salt Lake – Seattle Sounders

It was rough being a Sounder’s fan last night.  Amidst discussions of a CONCACAF Champions League curse, playing at altitude and missing one of their best players of the season in Mauro Rosales, the Sounders had a tough playoff matchup against Real Salt Lake.  While most fans would have been surprised if the Sounders had come away with a first leg lead, going down 3-0 was a bit of a shock.  Not only did they concede 3 goals for only the third time all season, but they just looked awful.  Using Opta’s chalkboards, let’s take a look at what went wrong.

If you chat with me about the statistical analysis of soccer, one of the first phrases out of my mouth is probably “I hate passing percentage”.  I still do (because often the numbers are quoted without context and used to “prove” one team is superior to another), but I am going to use some passing stats here to illustrate some points.

Passing Momentum

Total Attempted Passes for each team over time


For the first 30 minutes, Seattle clearly struggled to control the ball and allowed Real Salt Lake to maintain possession and pass the ball around.  Why is this important?  Seattle is a team that has been competing in 3 tournaments and is playing at altitude that it isn’t accustomed to.  Chasing the ball for 30 minutes to start the game is sure to be taxing on already tired legs.  It wasn’t until around the75th minute that Seattle started to see a sustained advantage in passes completed, however, that wasn’t so much because of their improved play but because RSL shut it down and tried to protect their two goal lead.

 

Total Attempted Passes in the final third

Looking at passes just in the final third, again Seattle was the inferior team, failing to get much penetration early on while having to absorb lots of pressure from Real Salt Lake.  Seattle had some opportunities towards the end of the first half, but failed to capitalize.  Towards the end of the match, Seattle was again getting opportunities in the final third, but their inability to complete a pass really let them down.

Passing Distance

Distribution of passing distances for Seattle Sounders and Real Salt Lake

Why did Seattle have such a hard time completing passes? Whether it was that Real Salt Lake did a good job of closing down the passing channels or Seattle failing to move off the ball and provide options for their teammates is hard to say without going back and rewatching (something I can’t stomach).  What is apparent, is that Seattle had to revert to attempting much longer passes than Real Salt Lake.  The above graph shows the quartiles of attempted pass distances for each team in 15 minute increments.  Throughout the game, but in particular early on, Seattle’s passes were much longer than Real Salt Lake’s.  Seattle definitely struggled playing out of the back, with defenders often trying to play the ball down field to alleviate pressure, but failing to connect with a teammate.

Passing Out of the Back

Passing completion in the defensive third. Weight of the line is the average distance of the passes.

There’s a lot going on in the graph above, but basically for the defensive third it shows passing completion and the average distance of complete/incomplete passes.  Seattle’s passing completion out of the back is very low with the incomplete passes tending to be much longer than the completed passes.

Midfield Battle

Pass selection for Seattle midfielders

Pass selection for Real Salt Lake midfield

Not surprisingly, RSL’s midfielders were able to complete a high number of short passes while Seattle’s midfield attempted longer passes with little success.  Of particular note is that there were long stretches of time where Alvaro Fernandez, Brad Evans and Lamar Neagle failed to complete a pass (hard to tell in the graph, but if there isn’t a dot on the line, there is no pass attempted for that time period and the software just connects points where there where there was data.  It’s not just that the Sounders midfield didn’t complete as many passes as Real Salt Lake, it’s that they didn’t see enough of the ball.

Shots

Shot Distances by Type of Shot

While Seattle only managed 5 Shots On Target, they were pretty even with Real Salt Lake in terms of shots taken from 18 yards or less.  RSL’s dominance in Shots On Target comes mostly from long distance shots.  Seattle was a little unlucky with the goals they conceded and had they been a little more clinical, the scoreline could have been a little more favorable.  I like that Seattle was selective in the shots they took and waited for good opportunities while (for the most part) restricting RSL to shooting from the outside.

Summary

The passing stats for the Sounders are atrocious.  They allowed Real Salt Lake to dominate possession early on, causing themselves to chase the ball and wear themselves out.  Long passes out of the back caused them to bypass the midfield and more often than not return the ball back to Real Salt Lake.  Seattle was able to absorb a lot of the RSL pressure and keep them shooting from the outside.  The abscence of the two first-choice center backs for RSL plus the possible return of Mauro Rosales bodes well for Seattle.  The Sounders are no strangers to scoring three but will find it tough since Real Salt Lake can put 11 behind the ball and protect their 3 goal lead.

The Curse of CONCACAF Champions League and Squad Management

Brek Shea of FC Dallas and Michael Seamon of the Seattle SoundersDuring tonight’s MLS Playoff match between the New York Red Bulls and FC Dallas, the “Curse of CONCACAF Champions League” was brought up.  FC Dallas has had to play more matches than NYRB this season and came into the match looking a bit fatigued.  Since the CONCACAF version isn’t as lucrative as the European version, it is getting the reputation as being a drain on teams.  This begs the question, are teams that participate in the Concachampions at a disadvantage when it comes to the MLS playoffs?  A Beautiful Numbers Game has a post on the correlation between factors that contribute to winning play-off series.  Not surprisingly, number of matches played is important.  What hasn’t been discussed is how manager’s deal with squad rotations and what effect does that play on success. Major League Soccer is a parity league, so unlike in Europe where more successful teams can go out and buy new players if they qualify for additional tournaments, MLS teams have similar resources.  There is some unknown quantity of allocation money that teams get when they qualify for CCL, but the number of roster spots is fixed.  Are teams using their resources differently?

Read more

NESSIS Videos Posted

As I’ve previously posted, I had the chance to speak at the New England Symposium on Statistics in Sports.  They’ve now posted the videos and slides from all the presentations.  I’ve posted my video below as well as the slides and original blog post so that all the content is in one place.  Originally I wanted to title my talk “Cool Shit You Can Do With Markov Chains in Soccer” but toned it down a bit to “A framework for tactical analysis and individual offensive production assessment in soccer using Markov chains“.



NESSIS Wrap-up and Slides

This weekend I had the privilege of speaking at the New England Symposium on Statistics in Sports.  It is a much more technical conference than the Sloan Sports Analytics Conference so I felt a bit like a duck out of water given my background in computer science and not hardcore statistical methods (and these guys were hardcore!).  Originally I had planned to do a write up, similar to the one I did for SSAC, but there was too much going on for me to take adequate notes.  I really enjoyed chatting with a lot of people who are similarly passionate about their respective sports and take the time to sit down and produce cool stuff.  The panel discussion was also fascinating.  Some of the themes that were discussed during SSAC carried over such as:
Read more

New England Symposium on Statistics in Sports

I am thrilled to announce that I will be speaking at this year’s New England Symposium on Statistics in Sports (NESSIS) on September 24th. Earlier this year, StatDNA announced a Soccer Analytics research competition and my paper was selected as the winning entry.  I’ll be giving a talk titled “A framework for tactical analysis and individual offensive production assessment in soccer using Markov chains”.  Catchy, right? Well, if that didn’t grab your attention, Chris Stride from the University of Sheffield will be giving a talk called “Cheating in football: Team culture, player behavior,or question of circumstance?” and there are several soccer related posters as well. If you’re attending NESSIS, drop me a line at srudd@onfooty.com or come say hi after my talk.  I’ll also be attending the post conference drink-up at Porter Square’s Tavern in the Square.  For those that can’t attend the conference, below is my abstract.  You can find the others here.

A FRAMEWORK FOR TACTICAL ANALYSIS AND INDIVIDUAL OFFENSIVE PRODUCTION ASSESSMENT IN SOCCER USING MARKOV CHAINS

Markov Chains are an effective way to model transitions between states. Assuming that the current state is independent from the previous state, Markov Chains can be used to model the set of state transitions that make up a possession in soccer. The transitions are used to determine the probability a possession ends in one of two final states; scoring a goal or relinquishing possession to the opposing team. Once the final probabilities are known foreach state, they can be used to determine game situations from which goals are more likely to develop, team strengths and weaknesses and metrics for assessing the offensive contributions of players.

Using this framework on the sample data set, we found that teams are more likely to score from taking long corners than short corners, with the notable exception of Tottenham Hotspur who excel at short corners. The top 3 teams most likely to score from a long corner are: Arsenal, Newcastle and Stoke. The top 3 teams most likely to concede from a long corners are: Everton, Arsenal and Newcastle. The framework can also be used to look at various game situations like building from the back, counter-attacks, free kicks, and entries into the final third, for example.

Additionally the transition probabilities can be used to determine which individuals are best at receiving the ball in situations with a high probability of scoring and which individuals are best at moving the ball to an improved state with a higher probability of scoring than their current state. The top 3 players for increasing the probability of scoring are Tim Cahill, Yaya Toure and Cesc Fabregas. The 3 most wasteful players who decrease their teams probability of scoring the most are Darren Bent, Peter Odemwingie and Gael Clichy. The top 3 players who receive the ball in the most advantageous states are Dimitar Berbatov, Nile Ranger and Benjani Mwauruwari.

Where Does Talent Come From? — La Liga Edition

Map of birthplaces for players currently in La Liga (excluding Spain)

This is our second post in our series on the origins of players — previously we looked at the Premier League.  La Liga is little different from the Premier League, both in terms of infrastructure (“B” squads can compete in lower divisions) and culturally (Athletic Bilbao has a policy to only sign players from the Basque region).  The methodology for obtaining this data is discussed in our previous post. And now, the highlights:

Read more

Football Factories: Where does talent come from?

Going global: Birthplaces of Premier League players (excluding UK).

Another summer is on its way out with Arsenal barely making a splash in the transfer market. Once again it looks like Arsenal will be relying on youth this season. It got me thinking — are Arsenal really good at producing players from their youth academy who are capable of playing in the Premier League?

Read more