This weekend I had the privilege of speaking at the New England Symposium on Statistics in Sports. It is a much more technical conference than the Sloan Sports Analytics Conference so I felt a bit like a duck out of water given my background in computer science and not hardcore statistical methods (and these guys were hardcore!). Originally I had planned to do a write up, similar to the one I did for SSAC, but there was too much going on for me to take adequate notes. I really enjoyed chatting with a lot of people who are similarly passionate about their respective sports and take the time to sit down and produce cool stuff. The panel discussion was also fascinating. Some of the themes that were discussed during SSAC carried over such as:
- How do you quantify intangibles like heart, energy, being a winner, etc?
- Don’t try to solve all problems at once; most coaches are smart guys and generally have things right. Instead look for incremental improvements.
- Ranking systems are fun for fans but not that useful to teams
- Good communication is essential, as is timing and learning style
One of my favorite parts of the panel was that both Kenny Atkinson (assistant coach for the NY Knicks) and Roland Beech (Director of Quantitative Analysis a.k.a “The Stats Coach” for the Dallas Mavericks) are both part of the coaching staffs for their teams. It was fantastic to hear how analytics can be integrated into the coaching process and it sounds like another example of what Bill Gerrard calls “Evidence Based Coaching”.
The videos of all talks will be posted on the NESSIS site but unfortunately I don’t have a time frame (you can see the 2009 videos here. In the meantime, if you are interested in my talk, you can read the blog post here and you can download my PowerPoint slides here.
Special thanks again to Jaeson and the rest of StatDNA for setting up this opportunity for me to speak. It was a great experience and I really enjoyed meeting everyone and exchanging good ideas.


Hi First of all I want to congratulate you on an excellent idea on analysing football, an amazing idea and really want to say thanks! For me it’s a paradigm shift on how I think about analysing football with stats.
I had a couple of points I hope to be able to clarify or discuss with you. Firstly let me just say that I don’t fully know the data you’re using to do this nor do I fully understand the procedure behind it, so I’m a little hazy on the implications.
I guess the key takeaway from this methodology is that it generates the probabilities on its own ie. weights the action performed based on actual facts and you don’t have to assign them. That is a brilliant thing. So then the inputs in this system are the states one puts in, the better you define the states the better the results it will give. The defining of the states is limited by what data exists and having a statistically significant number of data points for each state. These factors ofcourse can be improved and perhaps even driven by the needs of such a system. I think thats a fantastic achievement.
But the things I want to clarify are:
1. I’m not sure if the states defined take into account how the opposition defense is positioned (or if it does it correctly). What I got was that it takes into account
“defensive pressure and defensive shape” it would be helpful if you could explain what these mean.
2. I think what it doesn’t take into account your teams positioning. For example, if the player on the ball has an open player ahead/beside him to pass to and he takes a shot it is quite different to when he doesn’t have anyone to pass to.
I think this is key and it may be biasing your results.
As in your analysis the players who all did better were those of good-ish teams and played deeper/had passing options. So they generally are able to pass to a player with better P(goal) which gives them credit. However if you think about players who had lesser passing options they would suffer even if they took the best option available to them, say shooting in this case, and did not score a goal. So their decision making was correct but they could not score (skill not that good or something)
So what I’m saying is this: It would be great if a system could separate decision making and skill
How I think it may be is this: Say Van Persie has the ball and has a P(goal) of .4 but Walcott is in a position of having a P(goal) of .75 and probability of a successful pass is 95%. In real life think of it as Persie being on the edge of the box on the left side with a defender on him while Walcott making a run unmarked and being parallel to him towards the right.
In real life we can say he should pass to walcott and not shoot but can we statistically say that a goal is more likely to result if he passes to Walcott than shooting? If yes then if he shoots and scores he should be given credit for having the skill to score from there but should be penalised in his credit for decision making.
Also the reverse may be true, as we so often see pundits criticise Arsenal of “over-passing”.
It may be that the statistically better option is to shoot rather than pass. It may be either that no player might be in a better position or that a pass may not exist (little probability of a successful pass).
Another scenario I can think of is that a typical No. 9 has gotten the ball upfield, has little statistical probability of scoring and hence should not take a shot but all the passing options that exist for him are of less P(goal), so even though they may contribute to a better in the next few moves and even a goal, he will get penalised for reducing P(goal). The question ofcourse is not that if this is possible, which I’m sure you appreciate it is, but if it’s happening alot and biasing your output. Because technically the player is taking the correct decision.
Also can we roughly say above which P(goal) is it good to take a shot?
I think this method is absolutely delightful and should lead to interesting discussions and developments as we try to refine it. Excellent, excellent work!
Excellent comments and you are spot on. To address your questions:
1) Defensive pressure and shape refers to a) how much pressure is being applied to the individual in possession of the ball. Are they completely open, totally marker, or somewhere in between? Defensive shape refers to the number of defensive lines between the ball and the goal. Do you have the whole team packed in behind the ball, or are you facing just the back line? There is no info about how each individual defensive player is positioned or if they are in good spots.
2) Information about where the offensive teammates are is the one piece of data I really wanted. Right now, with this model, we can say whether or not the player moved the ball into a better state, but not whether or not the player made the best decision unless we go back and actually look at the film.
Number 9s are really interesting in this model for the reasons you point out. They tend to do really poorly in the model for the reason that they often receive the ball with limited options (back to goal, hopefully can lay off to a teammate, or 50-50 ball in the air) and therefore get penalized. What I think is more interesting than looking at the overall number, is looking at which aspects of the player’s game are contributing to their score. A lot of info about a player’s style of play and how he fits into the team’s system can be derived from digging into the components. For example, Darren Bent and Kevin Davies are two of the worst performers overall. Darren Bent’s low score comes from many high probability shots resulting in the ball going back to the other team (either goal kick, or keeper save). Kevin Davies’ low score comes from 100s of headers that often go to the other team or a slightly worse state. Completely different!
And yes, this is just a first iteration of the model with the given data. Lots and lots of refinement is needed.
Thanks for the reply! Clears alot of what I was wondering.
Another amazing thing about the Markov Chain approach is that we can use any measure instead of P(goal) which gives the flexibility to analyse any aspect of the game someone might want to look at. (all I can think of right now of perhaps defensive contribution so – P(opponent scoring goal) but has so many possibilities!)
Also I think if there is enough data we can even get different P(goal) for two different players in the same state or against two different oppositions. That is, it can weight the quality of the player and opposition itself as well. Mindblowing stuff.
Yeah, exactly! Why I get so excited by this isn’t because of the results it produced, but where you can take it. Loads of good ideas.
[...] I’ve previously posted, I had the chance to speak at the New England Symposium on Statistics in Sports. They’ve [...]
Great talk & analysis – well done .. I was wondering what software you used to perform your analysis?
W
Thanks. I wrote custom software to parse the data and do the analysis. Programming is a useful skill