Wednesday, March 30, 2011

An Indictment of the MLS Playoff Structure, Part 4: Quantifying the Impact of Game Differential on First Round Odds

In my last post I used binary logistic regression (BLR) to show the impact of various EPL match attributes on the likelihood of winning a match.  Soon after beginning the work I thought of another good use for a BLR model - identifying the factors that truly matter in predicting (as compared to retrospectively assessing) first round MLS playoff success.  Regular readers know I am no fan of MLS's playoff system that seems to buck the trend of most leagues using a top-of-the-table format to determine their champion.  I took a pass at explaining the phenomenon of teams with fewer games played winning a disproportionate share of first round series since 2003, but at the time could not prove that it was one of the few statistically significant predictors from the many identified by Climbing the Ladder (CTL).  Since then I have used CTL's lineup database to construct a BLR model of the major factors identified by CTL and isolated the statistically significant factors.  The results suggest that MLS has a lot of changes to make to continue to improve their playoff format they seem to love so much.

The Data Set and a Few Summary Statistics

I utilized CTL's lineup database to create the following statistics:

  • Difference in the number of games played
  • Overall goal difference
  • Difference in coach experience (MLS games only)
  • Difference in seeds
  • Home record difference
  • Away record difference
  • Regular season goal differential between the two teams
Here are a few interesting summary statistics regarding several of the attributes above for the 2003 through 2010 seasons:
  • Games played difference: The maximum difference came in the 2008 NY/Columbus series where the Crew played 14 more games than the Red Bulls.  This was due to the Crew playing in both the old and new format for CONCACAF Champions League (CCL).  The median differential is 2 games.
  • Overall goal difference: 2005 through 2007 saw some of the highest overall goal differentials between first round playoff participants - one series each with 22 and 23 goals, and two series with 27.  The median goal differential was 9 goals.
  • Difference in coach experience: Sigi Schmid is the king of mismatches in manager experience with the largest gap of 196 games realized when his 2008 Columbus Crew defeated the Kansas City Wizards.  The median difference is 68 matches.
  • Regular season goal differential between the two teams: The peak of regular season goal differential between two teams was witnessed in the 2006 through 2008 seasons, when an unbalanced schedule saw teams play each other up to four times during the regular season.  Chicago's 2008 first round series versus New England witnessed the largest such goal differential - 8 from four regular season games.  Chicago went on to win that first round series, while the median value for the first round series is 2. 

The Results of the BLR

After compiling the data for every first round playoff series from 2003 through 2010, a BLR was constructed to predict the likelihood of a team winning the series.  Dummy variables for each year were constructed to ensure no special causes were missed.  Terms having a p-value greater than 0.05 were successfully eliminated until only two terms remained: manager experience differential and games played differential.  All other terms - overall goal difference, difference in seeds, home record difference, away record difference, and regular season goal difference - were not significant by a mile (most p-values were equal to or greater than 0.40).

Plots of the the changing odds with various manager experience and game differentials is shown below.

The plot of manager experience differential is counter intuitive, but is born out by the statistics.  Apparently having a less experienced manager actually bodes well for a team in the first round.  Perhaps it is that long term managers have more data points in the MLS playoffs, increasing the likelihood that the random nature of the playoffs will lead to their increased number of losses.  Perhaps newer, less experienced managers can take a team by surprise, or are managing teams that do not perform well the prior season and thus align with the teams with fewer matches.  Whatever the cause, it seems that fewer games for a manager, relative to the opposition team's manager, bodes well for the team with the less experienced manager.

More interesting is the intuitive relationship between the game played differential and the likelihood of winning the first round playoff series.  The equation provides a very clear relationship between the game played differential and resultant odds.  The relationship is largely linear from -5 to 5, meaning that it's about a 7.5% change in odds for each incremental game difference.  As game differential approaches more than five matches, the incremental benefit of increased or reduced game differential is minimal.

What does this mean for the 2011 playoffs?  The most direct assessment is that MLS missed a golden opportunity to correct this imbalance by not going to a two game series in the first round of the playoffs.  I commented here how I would have liked to have seen such a re-balancing prior to MLS's announcement of the 2011 format, as well as my reaction to the announcement that the first round would be a single match while the second round will have the usual two-match series.  If history is any indication, MLS essentially gave the lower seeds a 7.5% advantage in odds of advancing to the conference final by not making them play an extra game in the opening round.

We'll see how things play out in the 2011 playoffs and will update this study once they're complete, but I am not holding out hope.  With my Sounders having one of the league's more experience managers and their desire to go deep in the US Open Cup and CCL, I think it will be another year without an MLS Cup.

Perhaps MLS really does like this parity that borders on complete unpredictability.  Allowing MLS clubs to buy championships like Chelsea is not what I want, but the fact that no prior rational metric correlates to first round playoff success suggests MLS has some major adjustments to make.  It seems as if MLS has swung the pendulum so far to the side of parity that no club's supporters know what to expect from regular to post-season, let alone year-in and year-out.  Ultimately, this holds back the professional game's success and growth in this country.

Monday, March 28, 2011

Quantifying the Impact of the Bias of Arsenal's Referees, Part 1

Special thanks to Dog Face for the data (he and I will be collaborating on the second post in this series), and to Chris from Soccer By Numbers for help in dissecting the stats.

A little over a month ago I completed a post that quantified the different treatment Arsenal appeared to receive from various referees in the Premier League.  In that post I used statistics from Tim at 7AM Kickoff to show how shots taken, the ratio of shots-on-goal to shots taken, and Premier League fantasy points for yellow and red cards to show that Webb, Dean, and Dowd are the least favorable referees for Arsenal while Foy and Atkinson are the most favorable.  What I was unable to do at the time was to show how these different match statistics impacted the outcome of the match.  Luckily, a writer with Untold Arsenal that goes by the name of Dog Face contacted me and supplied data going back to the 2005/06 season and for every match in the Premier League for each of the seasons covered.  This data allowed for the analysis of the impact of such calls on match outcome.

The Data and Statistical Methods Used

Dog Face's data set contains key match statistics from every Premier League match from the 2005/06 season through the latest matches of this season.  To eliminate any error associated with using data from the incomplete 2010/11 season, I focused on the following attributes for the 2005/06 through 2009/10 seasons:

  • Venue (home/away)
  • Shots
  • Shots-on-goal
  • Corners
  • Fouls
  • Yellow cards
  • Red cards
The data came to me paired - each row showing the data for both the home and away - so I broke it into unpaired team data.  I then calculated the differential for each statistic except venue, which was coded as a binary statistic (1 = home team, 0 = away team).

In attempting to assess the impact of play on the pitch and referee decisions, we have several options.  We could try and determine a relationship between goal differential and the inputs listed above, but this is problematic given the relative paucity of goals and resultant goal differential.  I've done enough analysis of soccer match data to know this is a fool's errand.  The better method is to determine the likelihood of winning a match given the differentials achieved by a team or dealt out by a referee.  To do this a binary logistic regression analysis was performed using all of the match statistics.  A set of dummy variables based upon the season the data point came from were created to observe any the affects of any overlooked variables in the analysis.  Such a regression analysis allowed the construction of a mathematical model to predict the likelihood of winning (earning 3 points), with (1-likelihood of winning) being equal to the likelihood of not winning (earning 1 point for a tie or 0 points for a loss).  Unfortunately, as it's name suggests binary logistic regression's output is binary in nature and thus cannot differentiate between a soccer match's three possible outcomes.  This is an compromise that must be made to use the analysis.

Like any other statistical analysis, binary logistic regression analysis allows statistical significance to be tested.  In this analysis, the general rule of thumb of p <= 0.05 was used to determine which terms in the analysis were significant (with allowances for slightly higher p-values in team data given the lower sample size).  Based upon this criteria, the following factors were significant in impacting the likelihood of winning a match:
  • Venue (home/away)
  • Shots-on-goal differential
  • Yellow card differential
  • Red card differential
The same criteria ended up also being significant when isolating for only the Arsenal data within the wider data set.  This allows for a comparison of the impact of various match attributes on the average Premier League team, and how Arsenal is impacted to a greater or lesser degree for the same match statistic.

The Effect of Yellow & Red Cards

A comparison of the effects of various match statistics could be completed once binary logistic models were created for the league and Arsenal over the five seasons.  The two of interest - yellow card and red card differential - are of most interest as the referee directly controls when a foul is simply a foul and when it is serious enough to warrant a card.  As noted by Chris at Soccer By The Numbers, binary logistic regression predictions present some challenges when trying to provide two dimensional plots of the likelihood of an event (in this case winning) versus a single variable (in this case yellow or red cards).  With Dog Face's data I used an approach of splitting the analyses into home and away games, and then set all other variables to their averages for each venue while sweeping through the max and min values of the variable of interest (either yellow or red cards).  The output generated by each sweep came in three forms: the nominal odds, the lower 95th percentile, and the upper 95th percentile.  Such an approach allows us to observe how sample size and the variability of outcome as the data set approaches its extremes (yellow card differentials of 7 or red card differentials of 2) impact the confidence in the model.

The plots below show the impact that yellow cards have on match outcome.  The first graph shows the impact at home, while the second graph shows the impact away.  The black lines represent the likelihoods based upon the full league data over the five seasons.  The red lines represent the likelihoods based upon Arsenal's data over the same five seasons.  Solid lines, and their associated equations, represent the nominal predictions from the model, while the dashed lines represent the upper and lower 95th percentile lines.

A few things are clear from the graphs above.
  1. Playing at home clearly has its advantages.  Even with a six yellow card advantage at an away match while achieving their average away number of shots on goal, Arsenal's likelihood of winning an away match is only slightly better than a home when they are even on yellow cards playing to their average home form.
  2. Clearly the reduction in data points for Arsenal (190 matches) versus the league wide data (1900 matches) contributes to the wide variation shown via the 95th percentile lines.  The relative obscurity of Arsenal matches with an absolute yellow card differential greater than 2 creates the uncertainty at the extremes - 88% of all Arsenal matches ended with an absolute yellow card differential of 2 or less.
  3. A yellow card at home is only slightly less costly than a yellow card away - each yellow card away results in a 0.4% lower likelihood of winning versus a yellow card at home.  Clearly, the difference in home and away likelihoods of winning can't be chalked up to a difference in the impact of yellow cards when the yellow card differential home and away is even.
  4. The non-parallel nature of the Arsenal and league average lines in both graphs indicates that the impact of yellow cards on Arsenal is more severe.  To be exact, it's nearly three times as severe.
Similar odds can be calculated for red cards.  The graphs below show such odds over the range of red cards in the data set, and follow the same conventions as the yellow card graphs above.

A few more conclusions can be drawn based upon the graphs above:
  1. Playing at home has even bigger advantages when it comes to red cards.  In the case of Arsenal, even when they get a red card away their likelihood of winning with average away form is only 0.6, which is still 0.09 (or 15%) lower than the average home performance with no red card advantage or disadvantage.
  2. While the Arsenal data set still shows greater variation than the league wide data due to decreased sample size, it does show greater separation in the data sets (especially at home).  It could be declared that the separation at home between the two data sets for 0 and +1 red card differential shows that Arsenal's improved chances of winning are statistically significant when compared with the league average.
  3. Red cards are certainly a greater detriment to a team's likelihood of winning.  For an average Premier League team, they're 5 times as costly at home and away versus yellow cards.  For Arsenal, they're 4 times as costly at home and nearly 5 times as costly away.
The graphs above indicate the change in the likelihood of winning with each passing yellow or red card in a match, assuming Arsenal is playing at their average form for shots on goal.  They're very useful for illustrative purposes, but not very useful in assessing the impact of the referees identified in my previous posts.  For such an analysis, the individual likelihoods of winning each match are constructed from the match data, and a comparison between the referees is made.

The Impact of Referee's Decisions in Arsenal's Matches

From the graphs above, the impact of Arsenal's yellow and red cards are not the same as those on the average Premier League team.  Arsenal pays a much bigger penalty for their red and yellow cards compared to the average Premier League team, and thus the differentiation in referee statistics shown in my last post has a much bigger effect on Arsenal.

Now that a binary logistic regression has been created to predict the effects of various match statistics on the likelihood of an Arsenal win, the contribution from each statistic for each match can be measured.  In studying the referees, the match statistics have been broken into three categories:
  1. Things neither team nor the ref can control - venue
  2. Things the referee tangentially controls - shots on goal, corners, fouls, etc.
  3. Things the referee directly controls - yellow and red cards
There certainly is some interplay between all three - a home team may sense a more lenient ref (see Scorecasting) and will likely achieve a higher number of fouls before a yellow card is thrown their way. Luckily, from a statistical point of view very few of these interactions matter.  The results from the binary logistic regression indicate a precious few variables are statistically significant: venue, shots-on-goal differential, yellow card differential, and red card differential.

To calculate the impact of each referee, a comparison was made between
  1. Each match's likelihood of winning given the match statistics as called versus
  2. How the likelihood of winning would have changed had Arsenal experienced their average number of cards (adjusted for whether the match was home or away).
A general linear model was then constructed with this data to observe the impacts that season and referee had on the difference to the expected average.  The results from the general linear model are presented below via the main effects and interaction effects plots.

The graphs above confirm that Phil Dowd provides the highest differential against Arsenal from their expected mean.  On average, he costs them 4% per match against their odds of winning a match if they had experienced their average card differential - equivalent to a little more than a yellow card per match officiated.  As mentioned in the previous post on this topic, this is especially odd given the high proportion of home matches that he has officiated (home matches should result in a lower number of cards and thus higher proportion of winning).  Howard Webb is the only other official of the eight with a negative differential.  Four of the remaining six officials are right at the average differential of zero, while Chris Foy and Mark Halsey provides the most beneficial treatment of Arsenal.

All of this demonstrates that of the referees who officiate the greatest number of Arsenal matches, Dowd and Webb are the most biased against the Gunners.  Is this due to them actually being biased against Arsenal, or are they simply "tougher" officials when it comes to every team?  The calculations to determine one theory over the other are a good bit more involved, and will have to wait until the second post in this series...

Friday, March 25, 2011

Friday Night Links

Here's your weekly dose of my favorite links, with a slight change in format.

New Blogs, Magazines and Books

Starting this week I will be highlighting my favorite new blogs, magazines, newspapers and other great soccer writing that I've found.  They're good enough to go into my RSS feed, pay for a subscription, or pay for a copy of the book.  They may not be new to the writing world at large, but they're new to me as I've just now stumbled upon them.  I can't promise new finds every week (the limits of my RSS feed and reading time constrain me), but when I do make new finds I will post them here.

  • The Blizzard: If you enjoy the format of The Atlantic (sorry EPL Talk, I am a lifelong subscriber and can't help ripping you off) and the quality of writing you find at the Guardian, The Tomkins Times, or Zonal Marking this magazine is for you.  It's going to be released on a quarterly basis, and clock in between 150 to 200 pages.  It's like a fine drink - enjoyed slowly over an extended period of time as opposed to the rushed consumption of a blog.  Issue 0 is out now, and if you enjoy it Issue 1 will be coming out this summer.  Pay your three quid or more (it's a pay-what-you-like plan), and judge the content for yourself.
  • Overlapping Run: I've been reading content from this blog when its Take Me Out To The Ballgame series of posts would show up on the MLS Talk blog.  It seems now that the author has brought those posts back to his blog exclusively, and will keep all content exclusive to his site.  Great MLS attendance analysis, great comparisons between the leagues, and great quick posts.  A must have for the US soccer fan.
Favorite Links of the Week
Enjoy the soccer-filled weekend!  I'll be watching my Sounders live on Friday night, and will try to squeeze in the US Men's National Team match on Saturday afternoon before I go out to dinner.  See you on the backside...

Wednesday, March 23, 2011

Soccer Analytics and the Impact of the Glocal

"In a series of studies... [Roland] Robertson successfully challenged simplistic views of globalization... [He] offered a much more nuanced analysis in which he demonstrated that global and local actually merge into a new entity that he coined 'glocal'... Robertson observed an intensification of consciousness of the world as a whole. But he also argued that global cultural flows can reinvigorate local particularisms."
Gaming the World, pp 43-44
The authors of Gaming the World do a good job of extending the glocal concept, introduced by Ronald Robertson, to the world of sports in general and soccer in particular. It's this concept - the intensification of the local and the subset of the population that uses that intensification to lash out at the global - that came to mind when I read this post by Chris at Soccer By The Numbers. Chris offers up some rational criticisms of his recent guest post at the NY Times' Goal blog, and answers each of them with a balanced response. I've certainly run into a number of these criticisms myself throughout my brief year of blogging about soccer statistics, and have answered them in much the same way. Practicing statistics in a professional environment, I am also sensitive to not succumbing to "Superman syndrome" - swooping in to a situation that would benefit from statistical analysis, imposing my statistical knowledge on a subject where I have limited technical understanding, and then leaving as soon as the challenge is solved and getting most of the recognition. Thus I try to strike the right balance in my own analysis and writing, recognizing my own limitations of understanding the game of soccer.

While I am sensitive to such concerns, I think there's actually something deeper at work in a number of persons critical of Chris's work and others engage in similar endeavors. Before I launch into such analysis, I must make a disclaimer.

Chris and I know each other through our common interest in blogging about soccer statistics. We've exchanged data, co-posted on each others sites, and send each other emails on a semi-regular basis. We've never met face to face, but we are friendly on internet terms. I will also say that Chris has not spoken to me once about his experience regarding the reactions to his guest post, nor has he asked me to write anything about it. However, I've seen such reactions like those documented by Chris too many times not to comment on the topic.

I believe what Chris has seen is the intersection of two "glocalizations" - the internet and the global soccer community. And he's not the only one. Whether it's Paul Tomkins getting flamed to the point of quitting Twitter or Tim from 7AM Kickoff having to defend himself against/embarrass those who criticize his ability to write about the Premier League because he's not English, it seems attacking successful soccer writers is in vogue with a subset of readers. I see this coming from the intersection of three themes:
  1. An intense dislike for the globalization of what was England's game, especially when it gets "Americanized" via applied statistics
  2. An intense dislike for the intelligence required to blog about soccer analytics. There's always some tension when the nerds from high school end up becoming the "it" thing in a sport formerly dominated by former practitioners of the sport who may have been the jocks that disdained such nerds earlier in life. Who's to tell the successful practitioners that they may be missing something key on the pitch by ignoring some obscure and convoluted statistic?
  3. An intense dislike of anyone successful in the Internet age. It seems like we love to make ourselves feel more secure by tearing down those who are more popular, more successful (however you want to define success), or have more money than us. In the Internet age, everyone is special yet we're not content to actually respect or recognize greatness.
All of these have a similar theme that was touched on by Chris's response to the first major criticism he faced ("You Can't Quantify Soccer") - it's all about power. Lashing out based upon any of the three themes above is based upon an insecurity one feels due to the actions of those being criticized because the power base of those making the criticisms is threatened. Instead of recognizing the benefits of globalization, the improved match play that comes from a team built on statistical player evaluation, and great internet content worth paying for, those threatened by such emerging trends lash out and attempt to delegitimize such contributions. All in the false hopes of stopping an emerging trend that will likely contribute positively to the game we all love, and in the same time empower and reward a subset of people within the game.

To be fair, a good bit of criticism we bloggers face is based on facts and of honest intent. I am also the first to criticize unsound statistical analysis, and also subscribe to the theory that "no one ever liked the smartest kid in class." Garbage analysis is garbage analysis, and empowering people who pedal garbage analysis or are simply jumping on the soccer analytics bandwagon only serves to cheapen the game. I am not intending to paint anyone who disagrees with the globalization of soccer, the increased use of statistics within it, or finding ways to be monetarily compensated for soccer writing as having ill intentions. Anyone who reads my Friday Night Links knows that I often point out blog posts with views contrary to mine and are often non-statistical in nature, especially when they are well thought out and well written. What I am saying is that a subset of those who have such disagreements are motivated by what can best be described as soccer and internet nativism - a philosophy that says analysis and commentary is garbage if it's statistically based, not from England/Europe, or someone has the audacity to charge for their content. The game and the words written about it would benefit from such motivations of criticism being marginalized.

Ultimately, I approach sports much like I approach life: live-and-let-live. If Gaming the World taught me anything about global sports culture, it's that the concept of "glocalism" means such a philosophy is not very prevalent. Such visceral reactions as those documented above won't ever go away because of glocalism. I just hope that such reactions are further marginalized as the global soccer community, and the role filled by online soccer commentary via statistical analysis, both continue to grow. Otherwise, we may just continue to live under the tyranny of goals and miss out on the other, possibly more important, contributions from players during the remaining 88 minutes of match time.

Monday, March 21, 2011

Assessing Premier League Club and Manager Performance Against Their Starting XI Transfer Cost

Note: This is the third and final post in a series examining the effects of the of the transfer cost of a club’s starting XI on finish position in the English Premier League.

In the first two posts in this series, I was able to demonstrate the following:

  • Utilization rate, as measured by a team’s average cost of their starting XI divided by the average cost of their squad, is declining by about 1% every three years.
  • A regression model can be used to correlate a team’s average finish within the league with their average multiple of the league average starting XI cost.
  • The prediction intervals (PI) from such a model can be used to predict the odds of finishing in a certain table position based upon a team’s costs (both squad and starting XI).
In this final post in the series, the 50% prediction interval will be used to identify teams that have over and under performed versus their financial expenditures (similar to this post on squad costs).

Under and Over Performance of Clubs

Recall the graph below from my most recent post on teams' M£XI.

Note that the upper and lower bounds of the 50% PI are denoted by dashed red and green lines, respectively. Teams that fall above the red line represent a team that has, on average, finished in the lower 25% in terms of table position of teams that would have had similar expenditures. This represents under performance. On the other end, teams that finish below the green line have, on average, finished in the upper 25% in terms of table position of teams that would have had similar expenditures. This represents over performance. The teams that fall on or between these lines represent the expected 50% of teams that scatter around but close to the regression line. They are considered pushes.

Just like the similar MSq£ analysis, it’s not just good enough to have an above average finish. Consistency is what matters – in this case, consistency is measured via the team’s standard deviation of their residual to the regression model.

The table below represents just such a ranking (click on the table to enlarge). It is sorted first in order of performance against the model (over performance, push, under performance), then by standard deviation of residuals in decreasing order within each group, and then by the average residual. The table also shows the change in each team’s position (a negative score indicates improvement, while a positive score indicates degradation) from a similar table that looked at performance versus the MSq£ model. Just like the similar MSq£ analysis, only the 33 teams that have played three or more seasons in the Premier League have been included in this analysis.

It turns out the average absolute movement in the table is 1.76 positions, and the median value is 1. Nearly 85% of the teams moved two or fewer positions between the MSq£ and the M£XI tables. This should come as no surprise given the correlation between the MSq£ and M£XI metrics that was noted in an earlier post. Of the three teams who moved four or more positions, here is an explanation why each of them moved.

Liverpool’s movement into the top spot is due to their eking out a position in the over performance group that they just missed when looking at their performance vs. the MSq£ model. In the MSq£ model, Liverpool actually finished in the top spot of teams in the push category which consigned them to 8th position in that table. They have a nearly identical M£XI (1.64) as MSq£ (1.69), but because of the difference in slope terms in the two models (noted here) we know that multiples of the league average don’t go as far in the starting XI as they do in the squad. Thus, Liverpool ends up moving from a push to an outperform when looking at the cost of talent that made it on to the pitch and their second lowest standard deviation of residuals to the M£XI model (only the top under performer, West Bromwich Albion, “outperforms” them) carries them to the top spot in the rankings. Clearly no one gets more consistent over performance versus the financial expectations on the pitch than Liverpool. Supporters’ expectations are another matter…

The overall biggest improvement over their MSq£ performance is West Ham United. West Ham’s utilization rate has been extremely low – an average of 45.7% to rank them in the bottom sixth of all 43 teams that had competed in the Premier League through the 2009-2010 season. The teams that have ranked lower than them – Bradford, Derby, Hull, Reading, Southampton, Stoke, and Watford – spent an average of four seasons in the league. The fact that West Ham has spent 14 seasons in the league – missing only the inaugural campaign and two seasons of relegation from 2003 to 2005 – is a testament to their ability to squeeze a good bit out of a meager transfer budget that has a lower than normal utilization rate. West Ham has averaged a 5% lower utilization rate than the league average each season, with only three of their fourteen seasons seeing above league average utilization. See the graph below for the details of West Ham’s utilization rate each season versus the league average. Such a low utilization rate translates into an M£XI that is 12.5% lower than their MSq£, thus leading West Ham to move out of the push category and into the over performance category with a nearly identical standard deviation in residuals to the two models.

With the addition of West Ham and Liverpool to the over perform category, and none of the seven teams that fell into the same category in the MSq£ category falling out in the M£XI analysis, the overall number of teams over performing on an M£XI basis has increased to nine.

At the other end of the table we find the team with the next largest movement – Nottingham Forest. During their five seasons in the league between the 1992/93 and 1998/99 seasons they averaged a 15th place finish when their M£XI costs would indicate an expectation of averaging a 14th place finish. This places them within the 50% prediction interval for their starting XI costs, which moves them from the bottom of the under performance category to the bottom of the push category. Like West Ham, this is due to the fact that their average difference to the league average utilization rate was -5.9%. This led to a 12% lower M£XI compared to their MSq£. Ironically, the club had its second best performance in terms of utilization their last year in the league, but it wasn’t enough to save the club from relegation.

Under and Over Performance of Managers

If movement amongst the clubs in the M£XI is minimal, what about the managers? How much do they move when one takes into account the talent they can get on the pitch and not just the players they can buy and sell for their squad, and which managers over and under perform the most when compared to the financial expectations of the players on the pitch? The table below summarizes just such managerial performance versus the M£XI for the 40 managers who have presided over at least three full seasons of Premier League play (click on table to enlarge). As in the similar table for team performance versus the M£XI model, the column on the far right shows a manager’s change in ranking from the MSq£ table, with a positive change indicating a backwards slide while a negative rating means the manager ranks higher in this table than the MSq£ table.

Unlike the club table, there is a fair bit more movement on the managerial side. The average absolute movement in the managerial table is 3.2 positions, while the median movement is two positions. Nearly 77% of the managers experienced a movement of three positions or less versus their MSq£ rankings. Let's dig into the details of a few of the managers that sit atop the table, and a few others that experienced some of the biggest movement.

Similar to the MSq£ rankings, Chris Coleman sits atop the M£XI table benefiting from an M£XI that is 4% lower than his MSq£. However, John Gregory has fallen seven positions to the ninth spot in the table. A comparison of MSq£ and M£XI explains why - Gregory spent an average of 1.11 times the league average on squad transfer cost, but his starting XI average cost was 1.24. This was due to an above average utilization cost the three years he was at Aston Villa, while his third year at the club saw a phenomenal 70.3% utilization rate (M£XI = 1.57, CTTP = £75.5M). In fact, no one put a greater proportion of their talent on the pitch that year. What’s more telling is the fact that six teams (Manchester United, Arsenal, Liverpool, Chelsea, Aston Villa, and Newcastle United) fielded average starting XI’s that we more costly that season. Aston Villa’s eighth place finish was behind all but one of those clubs (Newcastle finished eleventh), and finished behind Leeds United, Ipswich Town, and Sunderland sides that fielded average starting XI’s that cost less to various degrees - £73.9M, £21.9M, and £31.9M respectively. While Gregory still outperformed the model on average, it could be said that his last full season at Villa was one where he had the most tools at his disposal on the pitch. It should be noted that the model only predicts a marginally better seventh place finish for the squad in that season, indicating that perhaps Aston Villa management’s expectations were still a bit too high given the financial resources they were willing to commit. No matter the reason, Gregory’s steadily increasing utilization rates over his three year term – 48.4%, 52%, 70.3% - contributes to an increasing M£XI and increased variability versus the model that shows up in Gregory’s standard deviation of residuals. This is ultimately what lowers his ranking by seven positions – putting more talent cost-wise on the pitch each passing season and seeing a relatively consistent sixth to eight place finish.

Rafael Benitez and Martin O’Neill leapfrog Evans and Houllier in the M£XI rankings for one main reason – extremely consistent M£XI, utilization, and table position statistics.

In fact, if Benitez hadn’t had such a poor last season at Liverpool he would have outranked Chris Coleman in terms of standard deviation of the residual to the M£XI model. Then again, if he hadn't had such a poor 2009/10 he might also still be managing there this year. Recall from the first post in this series that Liverpool has hovered around 50% utilization from the 2005/06 to 2009/10 seasons – indeed, Benitez’s lowest utilization was his first year (2004/05) at 41.8%. He consistently beat his squad’s M£XI expectations by at least two table positions, and twice nearly beat it by four table positions. Only in his final year did he fail to meet M£XI expectations, finishing 0.6 positions off the expected pace. Say what you like about Rafa, but he made the most of the talent he had on the pitch.

Along with being one of the most consistent over performers, Martin O’Neill also has one of the highest average over performances versus the M£XI model – only Sam Allardyce (-5.9) and Gerry Francis (-4.6) outperformed him (but with much less consistency). O'Neill's first stint in the Premier League was with Leicester City, where he averaged six places better than his meager transfer budget would have predicted (average MSq£ = 0.44, average M£XI = 0.46) and guided them to three League Cup finals (winning two of them) over a four year period. After departing for Celtic for several years, O'Neill returned to the Premier League via Aston Villa before the 2006/07 season. Over his four seasons at Villa, he guided them from an initial 11th place finish to a sixth place finish each of the following three years. He averaged nearly three places better than his M£XI suggested. While his cup success at Leicester didn't translate to similar success at Aston Villa, O'Neill did put Villa back in the top third of the table and was consistently threating for European play. Ironically, it is widely understood that O'Neill left Aston Villa after the 2009/10 season because of his disagreement with Villa's ownership over his desire to spend more money to improve their chances of finishing higher in the table. Perhaps Martin O'Neill and John Gregory should have a discussion about how such Villa transfer budget limitations, and the unrealistic expectations that are attached to them, can wreck an over performing team.

Further down the list we find the two managers who fell the most from their MSq£ rankings - Peter Reid (13 positions from low overperform to low push) and Claudio Ranieri (14 positions from high push to under perform).

Reid spent one year at Manchester City in the league's inaugural year, and then spent over four years at Sunderland - one full year in 1996/97 and three straight from 1999/00 to 2001/02. Sunderland's MSq£ and M£XI numbers were relatively consistent during Reid's tenure, but their finishes were not as the first and last years saw 18th place finishes that led to relegation while the middle two saw them finish 7th. That variability produced huge swings in his residuals, meaning that Reid's standard deviation in residuals is only surpassed by 10% of the managers in the table. Combine this with Reid's high utilization rates that boosted his M£XI by 15% versus his MSq£, and it is clear why he shifted from over performance to a push.

Ranieri's three years in the league saw him average a fourth place finish when club expenditures indicated that he should have averaged a second place finish. Even before Abramovich bought the team, Ranieri was seeing huge advantages in terms of the cost of the talent he could put on the pitch (2001/02 M£XI = 1.89, 2002/03 M£XI = 2.02). With Abramovich's purchase of the team and infusion of transfers in 2003/04 Chelsea became the fist team to break the 3.0 barrier on the M£XI metric, but were unable to win the Premiership due to Arsenal's Invincibles' run of perfection. Ranieri's three year run put his average M£XI nearly 10% higher than the average MSq£, and has happened so frequently in the table this greatly lowered Ranieri's ranking from one category (push) to the next lowest (under perform). Regardless of the metric, Roman Abramovich felt Renieri was under performing and replaced him with Jose Mourinho who brought them two Premier League Championships in three years.


A clear connection can be drawn between the cost of the talent a club can put on the pitch in the English Premier League and the likely table position that team can expect. The model doesn't explain every team's position, but it does explain nearly 70% of the variation in average team finish position and average transfer expenditure on the pitch. The other 30% is random noise due to factors that aren't quantified in the model. The model doesn't also explain match-to-match variation, where squad and starting XI transfer cost is likely far less deterministic.

What the model does do is set clear expectations for long-term success and failure. While a number of the concepts in this series are a bit advanced, they illustrate a key point for supporters and management: set your expectations for a manager's long-term average table position based upon how much they're allowed to spend in the transfer market.

While no hard and fast rules can be drawn, here are a few concepts one could apply based upon the MSq£ and M£XI analyses:
  • If a club wants to know how much money it must spend to avoid relegation year-in and year-out, they must spend at least the league average in terms of squad transfer costs. For 2010/11, this was nearly £116M.
  • Managers who can't achieve at least a 50% or better utilization are likely going to under perform. Managers lower in the table in terms of squad transfer costs need to coax a larger utilization percentage out of their playing staff to remain competitive.
  • Don't judge a manager on less than three years performance, and certainly don't sack him unless it's an emergency move to avoid relegation. Managers need time and money to succeed, and at least three years are needed to build a team that reflects the priorities and tactics of the current manager and not the last one.
  • Along those lines, don't expect a single, expensive transfer to move a team out of mid-table mediocrity into European qualification in one season. Soccer relies on eleven starters, several substitutes, and a number of back up players for a team to be successful. Those players require a system to play within, and the manager needs time to get players to instinctively play within the system. Soccernomics was right in one regard - transfer purchases made in one window show little correlation to success or failure in the current or next season. Building a team via transfers is a long-term investment that requires patience - on the part of supporters and management.
  • Pay attention to why a manager's utilization rate may be under 50%. If it's due to poor investments that didn't work out on the pitch, clearly the only option is to move on. But if it's one year of bad luck with key injuries to costly players, the manager should be given the benefit of the doubt.
  • If the goal is a Premier League championship, be prepared to spend big. Recall this table from my last post on the M£XI topic, which shows a club must be willing to spend at least 2.40 times the league average in squad and starting XI transfer cost to have even odds at winning the Premier League title. Such certainty will likely decrease in coming years, as the Big Six and a few other teams continue to flood the transfer market with pounds. This will drive the cost of a championship higher, while at the same time dilute the power of a single team being able to "buy a championship".
  • Recognizing the reality of the spending required of a championship, perhaps management and supporters should set more realistic expectations and aspire for European qualification as their ultimate goal. Some are already advocating this approach for at least one of the more financially limited clubs amongst the Big Six. More clubs should adopt this approach to keep finances manageable and expectations achievable.
  • Finally, if Liverpool are in the market for a new manager after Kenny Dalglish's care taker term runs out, I would think they would seriously consider Marin O'Neill for the job. Clearly the previous ownership group made a massive mistake in going with Roy Hodgson over O'Neill last summer. I don't pretend to know the thoughts of senior Fenway Sports Group managers, but I do know they're smart and use analytics to help guide their decisions. If O'Neill's tactics and transfer strategies are right for the club, I could think of few managers who would likely over achieve to a greater degree given Liverpool's storied history yet somewhat limited financial means.
With that I will be taking a break from posting about Premier League economic matters. The two series on the MSq£ and M£XI have built upon the excellent foundation laid by Pay As You Play, and they now provide a direct method for evaluating club and manager performance versus financial expenditures. I am deeply grateful that Paul Tomkins and Graeme Riley shared the data with me, and served as regular editors and sounding boards for ideas I had. I hope that readers have derived as much insight and enjoyment from the two series as I have.

I already know what my next financial posts will focus on once the current Premier League season wraps up and the mood to write about transfer markets strikes me again - a detailed dissection of Arsene Wenger's moves in the transfer market. Yes, I am a bit biased, but the data clearly shows that Wenger is the longest serving over achiever in the English Premier League. For Gooners like me, reconciling this over performance with the lack of trophies the last six seasons is the ultimate test of what I preach: setting realistic table position expectations based upon transfer expenditures. In doing such a detailed study, I hope to provide a better understanding of his successes, failures, and what types of expenditures might put him over the top yet allow Arsenal to win a much less expensive trophy. As they say in the investment industry, "can I eat my own home cooking?"

Friday, March 18, 2011

Friday Night Links

After several weeks off, Friday Night Links comes back with my favorite links from the last three weeks.

First, I must highlight a new soccer analytics aggregator of which I am a contributor.  It's called Soccer Analysts, and contains material from some of my favorite blogs.  I am honored to be a contributor, and I recommend you add it to your RSS feed.

And now, your weekly links:

I will conclude by saying "what a difference three weeks makes!"  My Gunners were riding high off a win over Barcelona.  Since then they have bowed out of three competitions and are limping into the final 10 games.  It starts this weekend against West Bromwich Albion.  Here's hoping Arsene Wenger can do the near impossible, and win the league with a starting XI cost (in terms of transfer fees) less than the league average.

The beginning of the MLS season didn't go any better for my Sounders, with a loss to the Los Angeles Galaxy coming on Tuesday night.  It's off to New York this weekend to play Theirry Henry and the Red Bulls.  Nothing like stacking the start of the season with the best competition the league has to offer!

Wednesday, March 16, 2011

Book Review: Scorecasting

I just finished the wonderful book Scorecasting: The Hidden Influences Behind How Sports are Played and Games Are Won.  It's full of great insight into the psychology behind different sports, backed up by statistical measures that demonstrate the impact to the games' outcomes.  I would recommend it to anyone interested in setting realistic expectations when it comes to the likely outcome of sports events (on a macro or micro scale), and especially sports statistics bloggers interested in improving their coverage of key factors in their analysis.  Special note to readers only interested in soccer statistics: this book is written from an American sports perspective, so it is heavy on football, baseball, and basketball.  It does have several great chapters that utilize soccer as an example, especially the chapter that identifies the true source of home field advantage, but don't expect anything close to the coverage found in Soccernomics or Gaming the World.

Herewith are some of my favorite parts of the book:
  • The first chapter on officials' propensity to swallow their whistle at key moments provides great insight into the idea of sins of omission versus commission.  The concept is pretty simple - as the penalty of making the wrong call goes up, refs are more likely to swallow their whistle and let players determine the outcome of the game.  What's amazing is the statistical data that backs this up - shrinking and widening strike zones based upon the pitch count and game's score, the way loose ball calls will go late in a basketball game, and the dreaded make up call.  It doesn't make the officiating less biased to understand why it's happening and how big the bias is, but perhaps it will be a bit less maddening via my new understanding.
  • A brief chapter on the Pittsburgh Steelers (a dominant NFL team) and the Pittsburgh Pirates (a pitiful MLB team) highlights the impact economics can have on the competitiveness of professional sports, a topic readers of my blog know all too well.
  • A chapter on the value of blocked shots in basketball is a great example of how sports analytics can turn MVP awards (or in this case, the NBA's Defensive Player of the Year) on their head based upon how shallow or deep the metrics within the evaluation.  In this specific chapter, Dwight Howard's league-leading blocked shot total turns out to be of the least value because they predominantly go out-of-bounds or to a player on the shooting team.  Tim Duncan, on the other hand, has a much higher percentage of his blocked shots turn into turnovers that benefit his team.  The key lesson: statistics are meaningless if what your measuring doesn't actually correlate to team success.
  • The statistical treatment of the sources of home field advantage are outstanding, caveats notwithstanding.  One chapter deals with dispelling the myths related to home field advantage - crowd support bolstering play on the field, the impacts of travel on the away team, or better knowledge of the peculiarities of their home stadium and field.  The subsequent chapter shows a pretty compelling source of the home field advantage - referee bias due to their subconscious desire to comport to the crowd's expectations.  Anyone doing analysis on the effects of match outcome, especially referee bias, must take venue into account to avoid confounding it with other factors that aren't the true sources of variation.
  • Finally, I loved the final chapter that dealt with the hapless Chicago Cubs.  The authors, Cubs fan themselves, show that the Cubs aren't unlucky. In fact, they seem to be quite average when it comes to luck.  What's really going on with the Cubs is that their players and management are responding completely rationally to economic incentives.  Why should they strive to put a higher quality product on the field when it seems as if Cubs fans will keep showing up to the ballpark regardless of the team's record?  The authors provide copious data from other baseball teams to demonstrate a number of the more successful clubs have fan bases that are actually sensitive to the on-field product and its success (or lack thereof).  This may be viewed as "bandwagon" fan behavior by many of a team's die hard supporters, but it turns out such sensitivity by the fan base ensures management is equally as sensitive to the performance of the team.  I've always been a proponent of such feedback - I walked out of the Sounders 0-4 drubbing at the hands of the LA Galaxy to send a clear message to the Sounder's management team.  It worked, as they refunded the price of that game's tickets.  Now I have the data to back up such a position, and have a leg to stand on the next time I am accussed of "not supporting my team".

Monday, March 14, 2011

Random Thoughts on the Upcoming MLS Season

Sorry, Colorado fans. This image won't repeat itself in 2011.

The MLS season kicks off Tuesday night (I'll be there as part of the toughest crowd in MLS), so I thought it would be appropriate to share a few random, largely non-statistical thoughts about the upcoming season.
  • I don't believe Colorado will repeat as champions.  Whether it's the parity in the league, or a Real Salt Lake side coming back stronger than ever, or the fact that Colorado will be competing in the CONCACAF Champions League and will likely have the dreaded "four more games curse" come playoff time, I just think the cards are stacked against them.  No offense to Colorado fans, but I just don't think the team is good enough given the Western Conference being stacked again this year.
  • Recall that in this post I quantified the relationship between table position and the percentage of points available.  That regression equation predicts that the Supporter's Shield winner should earn 68 points this year, nine more points than LA earned last year with four fewer games than this year.  I think this is a bit of a high prediction that is the result of the bias in the model due to the steady increase in the number of teams over the last few seasons, but it does provide a useful starting point.  I am going to say that it will take at least 64 points (62.7% of available points) to win the Shield.  This represents the a slightly higher-than-average percentage of points for a Shield winner (the average is around 61%), which I'll chalk up to each team getting four additional matches against the two expansion franchises this year where the top teams should be able to pick up 7 to 9 points more than they had last year.  This also accounts for LA's second highest point percentage ever last season (65.6%), and that I don't think we're going to see such a high point total again this season.
  • At the opposite end, I see the projected points for the last playoff spot (10th overall) correlating to the percentage projected in this post: 44.3% or 46 points.  The regression model has been relatively accurate in the lower places, so I am confident in using its predicted number.
  • Seattle fans better prepare themselves for a disappointing season.  I am such a fan, but I am forcing myself to be a realist.  For all the talk of this season being make or break for the Sounders, I don't think they've done enough to compete for an MLS Cup - so significant additions and some pretty substantial losses in Nyassi amongst others.  Add in yet another CCL run if they get past the play-in stage again, and the annual deep run in the US Open Cup, and I think they don't have enough gas left come playoff time.  If that is the case, it will be interesting to see what off season moves, besides the retirement of Keller and Nkufo, are made and how the fans react to it.  Seattle is no longer the cool new kid on the block with minimal expectations, and it will be interesting to see how the team and fans react if the high pressure expectations aren't met.
  • The same might be said of the Galaxy.  Unlike last year, they're automatically into the group stage of the CCL.  This gives them six more matches than every other MLS team other than Colorado (and perhaps Seattle and Dallas), and all of those games are in the second half of the season.  Throw in a few more games from the US Open Cup, and LA begins to build a resume that includes a much greater number of games than their likely first or second round opposition in the MLS Cup playoffs.
  • I am very excited for the addition of the Vancouver and Portland teams, but I get the feeling that Vancouver is the red-headed stepchild of the three teams in the Northwest .  It seems that the Seattle-Portland rivalry is the most intense of the three.  The speed with which the recent Sounders/Timbers friendly and the regular season match at Qwest Field sold out, while the Sounders and Whitecaps struggled to fill their Cascadia Summit match, speaks volumes as to where the intensity lies in the Cascadia Cup.  Beyond the rivalry on the pitch, Seattle and Portland generally dislike each other in every other aspect as well.  The whole Cascadia Cup rivalry will be great, but I suspect it will be the most intense between Portland and Seattle.
  • In the East, it will be interesting to watch how Philadelphia and New York perform.  New York's second year of their two key DP's - Thierry Henry and Rafael Marquez - are expected to deliver with outstanding young talent like Juan Agudelo surrounding them.  Last season was one where adjustment was expected and tolerated, but not this season.  In Philly, this was a team that had a respectable 35 goals scored - it doesn't sound like a lot but it does represent the 10th highest total in 2010.  Get a few more goals this year (especially from an outstanding Le Toux), let in fewer than the 49 they let in last year (especially on the road), and a 10th place finish and qualification for the playoffs isn't impossible.
  • Finally, it will be interesting to watch how the DC United rebound from their worst season ever - 22 points/-26 GD, futility only surpassed the last six seasons by New York in 2009 and Real Salt Lake's first season in 2005.  Beyond the quality of play on the pitch, they need to get a soccer-specific stadium built in the next few years or risk being permanently relegated to the lower half of the league in terms of public perception.  Past glory won't do, and DC United's leadership knows this.
It'll be interesting to see how things play out.  There are so many stories at the outset of this season - those are only the few I was able to write about.  I'll check back in on a quarterly basis to show which teams are under and over performing historical expectations.  It's going to be a fun season.  Enjoy it!

Friday, March 11, 2011

Premier League Three Quarter Season Report

As promised, here is the Premier League three-quarter season report. Just like the half season report, the table below shows which teams are over and under performing versus the table position expectations set by their transfer expenditures, goal differential, and points. Teams' table positions are determined by taking their current points and goal differential, multiplying them by the inverse of the proportion of matches they've played in relation to the full season's 38 matches, and then sorting points first and goal differential second in descending order. A team's metric colored in red indicates underperformance, while a metric colored in green indicates over performance. Click on the table to enlarge it.

The following observations can be made based upon the table above:

  • Manchester United's projected point total has fallen from 80 points at mid-season to 79 points today. This would represent the lowest point total for a champion in 12 years, and the 4th lowest point total of a champion in the history of the Premier League. This drop off is synonymous with Manchester's losses against Liverpool, Chelsea and Wolverhampton. If they had converted any one of those to one of the many draws they've achieved this season, they would still be on pace for 80 points.
  • In second place, Arsenal has closed the projected gap from six points to two over the last quarter of the season. This sets up a tantalizing May tie between the two teams at the Emirates. If they keep on the same pace of acquiring points, the winner of that match is the Premier League champion.
  • The projected goal differential between Manchester United and Arsenal has also been narrowed to two goals. In a season with the top two teams so close in points, the championship may be decided by goal differential. Since the half season report, Manchester United has been averaging a 1.0 GD per game while Arsenal has averaged a 1.3 GD per game.  Meanwhile the competition they face the rest of the season is pretty even - Manchester United's opponents have averaged a 0.14 GD this year while Arsenal's opposition has averaged a 0.6 GD. Again, one team beating the other in May gives the winning team a reasonable leg up. A draw, and it all comes down to who can score more and maintain winning form.
  • As mentioned here, Arsene Wenger is trying to do the highly improbable in winning the Premiership with an average starting XI cost below the league average. No team in the history of the Premier League has won it with an M£XI less than 1.26, and only three have won it with one lower than 1.85.
  • While the Premier League championship looks to be a two horse race, the spots for Champions League and Europa League look a bit up in the air. Tottenham Hotspur is only four points back of Chelsea and Manchester City. Liverpool is too far back to make things interesting for the fifth position, but how disappointed would Chelsea or Manchester City be to not make Champions League next year?
  • Fulham has moved up the most of any team in the league, going from 18th to 11th.  Liverpool has also continued their upward movement, climbing three spots under Kenny Dalglish and playing much better soccer.
  • Blackpool has backslid the most, dropping from 7th to 16th and is now at risk of being relegated.  There GD continued to worsen in the second half of the season, and they now only outperform expectations when it comes to their squad cost.
  • The race to avoid the drop will be just as entertaining as the one at the top of the table.  There are seven teams projected to be within five points of each other in positions 13 through 19.  Included in that group is Aston Villa, who are at risk of losing their special place as one of the seven clubs never relegated from the Premier League.  They should be doing much better given their transfer expenditures suggest they should be competing for 10th.

This is the business end of the season that is just too much fun - the title and relegation races turn on the outcome of every match.  Enjoy the next ten weeks of soccer, and I'll check back in at the end when the dust has settled!

Thursday, March 10, 2011

Using M£XI To Predict Premier League Table Position Odds

Note: This is the second post in a series examining the effects of the transfer cost of a squad's starting XI in the English Premier League.

In the first post in this series on the rising cost of a squad's starting XI was quantified, the decreasing utilization rate amongst teams was explored, and the behavior of the Big Six clubs when it came to starting XI transfer costs was presented.  But what about a more general model, one that uses linear regression and prediction intervals to quantify expected table position based upon a squad's starting XI cost?  How could such a model be translated into predictions for the odds of finishing in various positions in the Premier League table based upon starting XI cost?  Those topics are explored in this post, with special attention paid to the clubs that represent outliers.

A Regression Model for Table Position vs. Starting XI Cost

Similar to this post on squad transfer cost, a linear regression model with various prediction intervals can be constructed for average table position and average starting XI cost.  Such a regression provides a good indication of how much the talent on the pitch should cost over the long-term to provide long-term success in table position.

There is a slight difference in the M£XI graph below compared to the one in the MSq£ post: the regression line, 50th percentile, and 95th percentile prediction interval lines all appear on one graph.  This consolidates what was multiple graphs into a single graph where the full range of under and over performance can be viewed.

The dashed black lines - representing the bounds of the 95th percentile prediction intervals - indicate the bounds of reasonably expected individual values.  Data points that fall outside of these lines indicate gross under performance (above the upper line) or outstanding over performance (below the lower line) versus the expected finish position given the average cost of the starting XI the team put on the pitch.

The dashed red line represents the upper limit of the 50th percentile prediction interval.  Falling above this line indicates under performance versus the model.  Conversely, the dashed green line represents the lower limit of the the 50th percentile prediction interval.  Falling below this line indicates over performance versus the model.

Click on the graph to enlarge it.

It's interesting to note the similarities and differences between the graph above and a similar regression plot for MSq£ from this post.

  • The constant term in each regression equation - M£XI = 18.04 while MSq£ = 18.32 - indicates teams with relatively low multiples of the league average starting XI and squad transfer costs will be at similar risk for relegation.
  • The difference in the slope terms - M£XI = -6.9195 while MSq£ = -7.2221 - indicates an advantage in finishing position for increased multiples of squad expenditures of 0.30 versus their multiple of the league average starting XI cost.
  • However, the reality is that paying for talent that actually makes it on to the pitch is still the best way to improve one's chances of finishing top of the table (quite intuitive, isn't it?).  Even though the slope of the MSq£ regression equation indicates a 4.4% advantage in table position improvement vs. the M£XI equation when increasing multiples of the league averages are utilized, the fact remains that the average squad cost is more than double the average starting XI cost (2.12:1 to be exact).  Thus, signing talent and making sure they play all 38 games in a Premier League season is nearly twice as effective at increasing one's multiple to the league average £XI compared to simply breaking the bank and trying to increase one's squad transfer cost versus the league average Sq£.
  • The bounds on the 95th percentile and 50th percentile lines in both regressions are relatively close.  What has changed is several individual team's proximity to those lines.

A detailed discussion of over and under performance vs. the M£XI model will come in the next post, but a few words should be spent on the data points outside of, or close to, the 95th percentile lines.

The two teams outside of the upper 95th percentile line - Swindown Town and Odham Athletic - were previously discussed in this post.  The only other team close to the line is Crystal Palace, who spent three campaigns in the EPL between the 1992-93 and 1997-98 seasons and was relegated after each single season they spent in the league.  Since that last season in the Premier League the club has gone through several owners and has bounced between The Championship and League One.

On the other end of the 95th percentile distribution stands three teams that have out performed all other teams when adjusting for their financial resources - Queens Park Rangers, Reading, and Stoke City - although two of the three are likely not examples other Premier League teams would ultimately like to follow.

QPR, as an inaugural member of the Premier League, finished fifth their first season in the league.  Mid-table finishes the next two seasons were followed up with a 19th place finish in 1995-96 that saw them relegated to the Championship.  Their average M£XI of 0.42 was simply too small to avoid such a fate.  They eventually were relegated further to League One, and subsequently saw them pass into administration.  A reconstituted QPR has found itself a mid-level team in the Championship in recent years.

Reading made a brief two season appearance in the Premier League from 2006 to 2008, and their average M£XI of 0.11 ranks as the second lowest in the history of the Premier League (Watford's 0.10 barely beats them).  Good form in the 2006-07 season, which saw them finish eighth, was followed by a season with a disastrous second half and relegation back the Championship.  The team nearly regained their spot in the Premier League the following season, but lost in the Championship's promotion playoff.

Stoke City's one and only year in the league (2009-10) saw them finish twelfth with an M£XI of 0.25.  As of this writing, Stoke is on track for another 12th place finish, but is at risk for relegation with only three points separating them from the drop at 18th position in the table .  Surviving for a third year would mark a milestone few teams with such a meager transfer budget on the pitch attain. Only one other club (Birmingham) has spent as little on transfers and remained in the Premier League more than two years.

Ultimately, that's what this analysis and the one related to MSq£ prove - gross under and over performance is only found at very low multiples of the league starting XI and squad transfer costs.  In both cases, such under and over performing teams don't seem to last long in the Premier League as their meager transfer budgets are no match for the teams spending more than them.  There are only six teams in the Premier League who can spend the money to compete for a Champions League position each year, and only twelve teams in the history of the Premier League have managed to spend the league average or better (seven of which are the teams never relegated).  The interplay with the teams in the Championship looking for promotion the subsequent season can't be underestimated either.  While these lower spending teams certainly outperformed expectations in the Premier League, they often occupy the middling of teams that could just as easily find their transfer expenditures (and subsequent place) in the upper half of the Championship.

The Impact of M£XI On The Odds of Various Table Positions in Premier League

If the odds seemed to be stacked against such spendthrift teams, what about those who choose to spend more?  How are their odds impacted by greater expenditures, and how do they know they've spent enough to  have a good chance at their goal - a spot in UEFA competitions or the Premier League title?  Luckily, ever expending prediction intervals can quantify such odds.  The following series of tables do just that, quantifying the squad and starting XI transfer costs and multiples required to achieve such odds per the regression model.

A reference point for average values must first be defined before translating the predicted multiples into absolute values.  The average Sq£ at the beginning of the 2010-11 season was £115.7M, while the projected £XI for 2010-11 is £54.7M (based upon the average from 2009-10 and projected growth of £585.5k per year via the regression model) .

The table below shows the squad and starting XI expenditures required to realize various odds of finishing top of the table in the Premier League.  The regression model is pretty accurate for the lower odds based upon the expenditures witnessed over the years.  Of the thirteen teams who had an M£XI of 2.46 or more six have won the Premiership, and a similar outcome is seen for teams with an MSq£ of 2.40 or greater.  The accuracy of the model starts to break down just a bit the higher one goes in the odds - history shows that four of the ten teams who have had an M£XI of 3.05 or more four winning the Premiership.

Arsenal fans should take special note: Arsene Wenger is trying to do what appears to be impossible.  All but three of the Premier League's champions have had an M£XI of 1.85 or more (corresponding MSq£ of 1.72 or more), and the Premier League champion with the lowest transfer expenditures ever (Manchester United's 1996-97 squad) still had an M£XI of 1.26 (MSq£ of 1.34).  After letting their M£XI drop to 1.05 in the 2008-09 season, Arsenal saw a slight rebound last year to 1.20.  However, as of this writing they had regressed to a 2010-11 M£XI of 0.96.  Arsenal being in second place in the Premier League table may be a testament to Arsene Wenger's ability to get more out his meager transfer expenditures than any other manager could, but it may be too much to ask of him to expect perennial championship contention with such a historically low transfer multiple.

What about the required expenditures to improve a club's odds for making the Champions League given the Premier League's four spots?  The table below summarizes those odds.

This is really where the model's effects of over predicting the financial resources required of clubs comes into play.  Of the 25 teams who have had an M£XI of 2.03 or more, only 3 have failed to finish fourth or better.  Manchester City's 2009-10 and Newcastle United's 2003-04 campaigns saw both finish fifth, while Newcastle set a new standard for under achievement with a 11th place finish with an M£XI of 2.41 in 1999-2000.  Ninety-five percent of teams that finished fourth or better have had an M£XI of 1.05 or better, with Arsenal's annual over achievement versus their transfer expenditures adding to the low M£XI totals.


A regression model that predicts table position based upon a club's multiple of the league average starting XI transfer cost has been constructed, and its resultant prediction intervals have been used to identify gross under and over performers.  Those under and over performers seem to be concentrated at the low end of the M£XI distribution.  Additionally, odds of finishing in the upper 20% of the league have been identified, with various accuracies to historical data being realized.

An analysis of team and club under and overperformance versus the 50th percentile prediction interval, similar to the one conducted for MSq£, can now be conducted.  That topic will be the subject of the third-and-final post in this series.