What Politicians Are in Favor of Marijuana Legalization?

We all know that marijuana legalization is an inevitable process. But there are hardly any politicians who have clearly voiced their support so far.

Although there are no politicians who are ready to openly support marijuana for recreational purposes, there are few of the politicians who would rather have the public use New York medical marijuana for medicinal purposes.

On the other hand, most of the politicians are all for discussing how criminalized marijuana has shown some drastic damage caused by the American criminal justice system. You can go to og news source to learn all about how marijuana has affected so many lives in America.

Only future will tell us if Marijuana will be fully decriminalized or no, but here are few of the eminent politicians who have voiced their opinion in the legalization of Marijuana.

1. Barack Obama, former president of the United States

“Middle-class kids don’t get locked up for smoking pot, and poor kids do… We should not be locking up kids or individual users for long stretches of jail time when some of the folks who are writing those laws have probably done the same thing. It’s important for [the legalization of marijuana in Colorado and Washington] to go forward because it’s important for society not to have a situation in which a large portion of people have at one time or another broken the law and only a select few get punished.” – the New Yorker

2. Rick Perry, United States Secretary of Energy

“What I can do as the governor of the second-largest state in the nation is to implement policies that start us toward a decriminalization and keeps people from going to prison and destroying their lives, and that’s what we’ve done over the last decade.” – speech at the World Economic Forum

3. Bill Clinton, former president of the United States

“I think that most small amounts of marijuana have been decriminalized in some places, and should be. We really need a reexamination of our entire policy on imprisonment.” – Rolling Stone

4. Gavin Newsom, Governor of California

“These [recreational marijuana users] are incredibly upstanding citizens: leaders in our community, and exceptional people. Increasingly, people are willing to share how they use it and not be ashamed of it…These laws just don’t make sense anymore. It’s time for politicians to come out of the closet on this.” – the New York Times

5. Elizabeth Warren, United States Senator

“You know, I held my father’s hand while he died of cancer, and it’s really painful when you do something like that up close and personal… And it puts me in a position of saying, if there’s something a physician can prescribe that can help someone who’s suffering, I’m in favor of that. Now, I want to make sure they’ve got the right restrictions. It should be like any other prescription drug. That there’s careful control over it. But I think it’s really hard to watch somebody suffer that you love.” – Boston’s WTKK-FM

6. Rahm Emanuel, Mayor of Chicago

“We have police officers arresting people for 10 grams, 11 grams, 12 grams. A huge amount of time dedicated to that. Then, they go to court. That means they’re not on the street fighting gangs, fighting gun violence… This is a healthy discussion to have because we’re making a change. I think it’s a smart change because I want our police officers focused on serious violent crime.” – on Chicago’s decriminalization of small amounts of marijuana, the Chicago Sun-Times

7. Jared Polis, Governor of Colorado

“I am optimistic that we will reach a day when America has the smart, sensible marijuana policy that we deserve. … We are at a tipping point, on the unprecedented cusp of legalization. The progress at the state level has led the way, but it won’t come nationally until it happens in a critical mass of states. Then there comes much more pressure on Congress to legalize and regulate at the national level. Our streets will be safer and our economy stronger.” – speech at NORML conference

8. Harry Reid, Former United States Senator

“If you’d asked me this question a dozen years ago, it would have been easy to answer – I would have said no, because [marijuana] leads to other stuff. But I can’t say that anymore… I think we need to take a real close look at this. I think that there are some medical reasons for marijuana… I guarantee you one thing. We waste a lot of time and law enforcement going after these guys that are smoking marijuana.” – Las Vegas Sun

9. Denny Heck, United States Representative

“I actually think that having marijuana as a Schedule 1 drug is the height of silliness. Meth is a Schedule 2 drug. I mean, this just makes no sense. It’s nuts. I’ve also always supported allowing marijuana to be used medically… in a prescribed way.” – the Olympian

Payroll doesn’t win MLS Supporters’ Shields


For my international readers, I must take a few sentences to explain the Americanized version of the world’s game that MLS plays. We don’t award our championship to the team that wins the table.

Instead, we give them what we call a Supporters’ Shield which qualifies them for the CONCACAF Champions League, and then we break them and the next seven teams into a playoff.

That’s right – our domestic league uses a knockout round system to determine its champion. This bows to the very American way of determining championships in every other major sport – hockey, basketball, football, and baseball. In all of those other sports except football, playoff teams at least have to win a 5 or 7 game series to advance to the next round – that is, they have to win 3 or 4 games over their opponent.

MLS and our football league – the NFL – have largely decided on a single game format. One match between two sides decides who advances to the next round of the playoffs. Pull off one above average performance, and you can easily send a team that consistently outperformed you all season into the off season wondering what all that hard work was for.

Not content to be like every other US sports league, MLS does throw in a home-and-away format in the first round of the playoffs. It’s not clear to me why they do this in the first round but not the second round or the MLS Cup. Perhaps it is an attempt to prevent a large number of first round upsets, but it isn’t exactly clear to me the purpose it serves.

To keep the playoffs interesting, MLS breaks eight teams every year into the playoff tournament. This presents some interesting challenges for a growing league. The total number of teams in MLS the past few seasons is as follows:

1.2005: 12
2.2006: 12
3.2007: 13
4.2008: 14
5.2009: 15
6.2010: 16

The teams added in 2007 through 2010 are not ones promoted from lower divisions – they are new franchises created for the explicit purpose of playing in MLS. This is another key difference from the rest of the world’s game. Because of the rapid growth the last four years, this will be the first year that MLS will have not placed the majority of its teams into the playoff tournament.

All of these factors have a huge impact on the American game. Given that one just needs to make it into the postseason tournament to have a shot at a championship, a number of teams with losing records make it in every year.

Indeed, last year’s champion, Real Salt Lake, had a losing record and happened to flip a switch at the end of the season to make it into the playoffs and pull off an impressive run of wins once they were in the tournament. Also, given that there are a number of new franchises the last few years looking to make a splash, spending is way up for them yet they still struggle with the usual “expansion team” performance challenges.

Finally, there is only the motivation of playoff seeding to push teams to compete for the top spots in the table. All these could potentially affect the drive of teams to spend money and resources to finish first rather than fourth or fifth in the table, thus making a relationship between payroll and performance harder to prove.

The Inputs

For this analysis I used 2005 through 2009 player salary data and MLS final table standings data.

As MLS is divided into two conferences (East and West) for the playoff format, I had to combine the two conferences for each season and assigned finishing positions based upon each team’s total points for the season. Where ties in points existed, I awarded the teams the same position in the table and then skipped to the next finishing position for the first team after those that were tied. Once each team in each season had a finishing position assigned, I compiled the average finishing positions for each team.

I used the player payroll data to calculate each team’s payroll as a multiple of the league average for the season. The team payroll multiple from each season was then compiled to make an average value for each team in MLS.

The results of these two compilations can be seen in Figure 1 below.

Just like the Soccernomics analysis, the data above is non-normal and must be transformed to perform any correlation studies and regression analyses. To do this, I initially tried the Soccernomics transformations of translating finishing positions to percentages as well as using natural logs and found that they worked. See Figure 2 below, where the p-value is greater than 0.05 and the assumption of normality is a safe one.

The team payroll data was also transformed by a natural logarithm, and we can now explore if there is any relationship between the data.

Correlation Test Results

As in my previous post on regression, the first attribute to check is the Pearson coefficient statistic before doing any regression analyses. Doing so will tell us if there is a statistically significant correlation between the two data sets. Figure 3 shows the results of the tests.

As Figure 3 indicates, there seems to be little chance there is a statistically significant correlation between team payroll and where the team finishes in the table as the p-values are not less than 0.05. What’s interesting is that if the DP salaries are excluded (Team% no DP), the correlation statistics actually improves.

I did try a number of other transforms to the data to see if there was one that would generate improved fit. Unfortunately, none of the other transforms I tried improved the correlation statistics. Thus, I conclude there is no relationship between team payroll and table finishing position in MLS.

Reasons for the Lack of Correlation

Given the prominence of the Soccernomics analysis and the different conclusion drawn for MLS, here are some explanations why we might see such a difference in outcomes between the leagues.

The poor cost/benefit equation of the DP: While the DP sucks up a ton of available pay-roll (MLS salary cap guidelines not withstanding), it represents only a single player on the pitch. As we saw in the correlation statistic comparison, the statistical score actually improves when the DP’s salary is removed. This is especially true of the LA Galaxy, whose blowout purchase of David Beckham and his $5M+ annual salary has resulted in two bottom table finishes followed up by an appearance in the MLS Cup in 2009.

The volatility in the league’s makeup: There have been three expansion franchises added to the league in the last three years of the data used in the analysis. Two out of the three have tried to make a big splash by signing DP’s, with one experiencing wild success in table position (Seattle Sounders FC) while the other has been in the league basement (Toronto FC). The Soccernomics study used a relatively stable list of teams that fought for positions in a mature league structure, which would provide less “special cause” variation seen in MLS’s results the last few years.
Low sample size: As with all statistical tests, sample size is key.

The greater the number of samples, the more forgiving the test is and the lower the threshold for concluding a statistically significant relationship exists. See Figure 4 below for an example of how the number of samples affects the critical test statistic. The highlighted column indicates the critical correlation values that a test must be equal to or greater than to ensure less than a 1% chance of error in assuming a correlation exists between two data sets. In the case of the MLS data I used, n=15 so one must observe a correlation statistic of 0.5923 or greater.

As I stated in my regression post, the Soccernomics study had 58 teams included and thus only needed to observe a correlation statistic between 0.2948 and 0.3218 to make a conclusion of correlation.

  • The league’s salary cap structure: Outside of the league’s DP rule, MLS does try to maintain some form of a salary cap like most other American leagues. While America tries to pride itself as one of the most capitalistic societies, it’s exactly the opposite in its sports leagues. In some ways, it makes sense.
  • Capitalism fosters a system of cutthroat competition that eventually leads to a few winners and many losers. This can be counterproductive to providing a healthy, competitive league of 20-30 teams. Providing a ceiling on team payroll may help league parity, but it does make it difficult to rationalize expenditures in hopes of future success.
Ultimately, as MLS moves towards a stable 20 team league in the next few years (increased sample size) and the use of DP’s becomes more rational with experience (improved cost/benefit equation) we may see the correlation statistic improve.
Next Steps
If the the main goal in MLS is not to win the table but to win the playoff tournament and the MLS Cup, what attributes could be considered in understanding the likelihood of winning the Cup? I will explore this topic in my next post. Until then, enjoy some soccer this weekend!…

Statistics are just numbers

I spent the weekend in Portland at a friend’s wedding, and one of the side benefits was catching up with two friends and their new born son.

The husband and I are always chattering about soccer whenever we get together (especially Arsenal as we are both supporters), and the topic of my blog came up.

He’s a regular reader, and while he enjoys the blog he did feel compelled to mention the change in tone just prior to my recent trip to Europe.

He singled out this post for comment, inquiring why I moved away from statistical commentary and spoke with a different voice in asking for readers’ recommended books. I think such inquiry provides a perfect opportunity for me to clarify my purpose for this blog and comment on one of those recommended books.

First, this blog is an outgrowth of my belief that statistics can illuminate unseen relationships, but they cannot be dealt with independently of the wider context of the subject area they are being used within.

As my Portland friend commented earlier in that visit, we Americans are obsessed with statistics compared to other nations. Our obsession often leads us to look for patterns or significance where there are none. The mark of a good statistician is one who, in their normal course of study of a subject, stumble upon a question that may be best addressed with statistical analysis – that is, the use of statistics should be a rare occurrence in most fields. The same should be no different in soccer.

The challenge comes in building a blog around soccer statistics when my stated goal is to not be in search of patterns that don’t actually exist. To me, this blog is a journey in understanding the many facets of the world’s most popular sport. Statistics will greatly aid in that journey – to make sense of some larger truths in the sport.

However, statistics often only describe the most likely outcome of an average event and not the actual outcome of a specific one, which is why we play each and every match to determine the actual outcome. In the same manner, I see statistics only being part of my journey through the world of soccer.

Statistics related to the latest happenings in the soccer world will make up the bulk of my posts, but I also don’t want to lose touch with the human element of the game. Soccer shapes our human experience, and to a greater degree our human experience shapes the game of soccer.

I want to understand the humanity that produces the numbers I study. With that explanation, I now dive into one of those recommended books from my previous post.

At the urging of a regular reader/re-tweeter I picked up Brilliant Orange: The Neurotic Genius of Dutch Soccer by David Winner. It was a late addition to my reading material for my trip, and I was a bit sad I couldn’t get it on the Kindle as it would have kept my bag weight down.

Nonetheless, it came highly recommended and had the side benefit of helping describe the culture from which many of my co-workers come. After completing this book, I only wish I had read it earlier. It would have made my nearly three years of working with the Dutch much easier.

The book, more than anything, is a fascinating study of how such a small nation that outperforms expectations copes with the inevitable defeat it faces in major tournaments.

This over performance has been quantified in the Soccernomics model, where the Dutch score a half goal more per match than the model predicts and they sit 9th out of 49 European teams in this category used to gauge over-performance (see Figure 14.4 in Soccernomics).

The book starts at the beginning of post-war Dutch soccer to explain how coping with over-achievement became a Dutch soccer challenge.

To younger fans like me, Dutch soccer can be found everywhere today. Even Brilliant Orange acknowledges this, singling out Arsene Wenger’s Arsenal squads as one of the professional embodiments of Dutch “Total Football” outside of the Eredivisie.

This wasn’t the case in the early 1970’s. At that time, the Dutch were perfecting their version of soccer and unleashed it upon the world in Germany in 1974. At that World Cup the Netherlands began its run of over achievement. The heartbreaking loss in 1974 and the expected loss of 1978 set up the Dutch soccer story and was dealt with in the Dutch psyche as something to be accepted, largely tolerated, and in some ways celebrated.

The book goes through and weaves compelling stories of Dutch cultural impact on their style of soccer. Chapter 14 explains how Dutch land constraints and use lead to a different visualization of the soccer pitch, while Chapter 25 explains how their strategies of multiple uses of the same space on the pitch are also reflected in the unique layout and use of space at Schipol International Airport.

In Chapter 18, the author explains how Dutch collaborative democracy is a handicap for their soccer team, and how the atheistic Dutch are still shamed about outstanding achievement based upon their Calvinist cultural mores.

Such belief structures make accepting failure to win a championship all that much easier, and Chapter 6 explains how such democratic tendencies doomed the great national team of the 70’s.

Chapter 15 details the Dutch struggle with their role in the Holocaust, and how the adoption of the Jews by Ajax as a way to cope with that past has bred modern anti-Semitic chants from rival clubs.

Chapter 13 explains how anti-German feelings, which were largely absent in the Netherlands until the late-70’s, rose and fell in the 80’s and 90’s via the heated battles between the two nations’ soccer clubs.

One of the final chapters plays right into one of the themes from Soccernomics. In the chapter entitled “5 out of 6: Frank, Patrick, Frank, Jaap, Patrick, Paul… and Gyuri” (read the book to understand the chapter numbering system), author David Winner takes us behind the scenes of a debate within Dutch soccer: to win, or not to win, by penalty kicks. It seems that in their pursuit of playing their beautiful game, Dutch teams of the 80’s and 90’s felt winning by penalty kicks was beneath them.

If they couldn’t win in 90 or 120 minutes playing their game, they felt it was better to leave winning up to the chance of penalty kicks rather than a system for taking them.

While other teams were analyzing goalie behavior before matches and devising systems for taking penalty kicks, the Dutch weren’t even practicing penalty kicks let alone doing any preparation like the other teams.

They were convinced of the Soccernomics conclusion that penalty kicks didn’t change the likely outcome of the match based upon certain predictors before Soccernomics was ever published. The problem is that they came to the wrong conclusion.

We know that the conclusion in Chapter 6 of Soccernomics is that penalty kicks don’t have a statistically significant impact on the outcome of a soccer match vs. the predicted outcome from the Soccernomics model that looks at home pitch advantage, GDP, and population size of the two countries playing each other.

The simplification behind that statement is that the statistical test says penalty kicks have no impact on the average outcome – it doesn’t say much about specific outcomes.

In the case of the Dutch team, we already know they punch way above their belt when it comes to international competitions.

In their case, each game won on penalty kicks would have been another notch in their belt of over-achievement. The fact that they have likely conceded a number of matches due to lack of practice or respect for winning via penalty kicks means their over-achievement is likely higher than measured in Soccernomics.

I’d be interested to see if the Dutch team itself would have shown a statistically significant shift in wins or losses based upon matches that went to penalty kicks. Luckily, David Winner outlines the ongoing battle by many in the Dutch soccer program to emphasize penalty kick practice and strategies.

In summary, I’d highly recommend this book to anyone. It’s well written, very conversational, and strikes an outstanding balance between soccer and cultural material. To read this book is to begin to understand both Holland and its soccer team.…

Alternative EPL Data Visualizations: The Cann Table

In all of my writing about the visualization of this season’s run-in, I thought it would be good to remind myself and everyone else of one of the more classic alternative visualizations: the Cann Table.

It’s a relatively straight forward and non-complex way to view how teams are performing relative to each other. It uses all the same data as a normal league table, but instead of arranging the positioning of clubs based upon table position it arranges them based upon point total.

This provides another perspective on the data that makes relative performance between clubs more visually clear.

Talking about table positions makes sense.

It’s a way to provide a consistent comparison between seasons when the point totals required to secure a specific table position can change greatly from year to year.

It’s also how confederations grant positions to clubs in their super-league tournaments, and it’s often used by fans to judge whether their club is making any progress season over season.

The challenge is that the traditional table is not the best tool for viewing what’s going on within a single season, and how much of a gap there really is between one’s favorite club and the clubs around them. Enter the “Cann Table“.

The Cann Table is named after Jenny Cann, who popularized it until her untimely death in 2003. In it’s simplest form it gives one line of text to each point total, and then lists the clubs next to their corresponding point totals.

If a point total has no corresponding clubs assigned to it, the column where a club would go is left blank. Consecutive blank rows communicates the size of the gaps between two teams in a very effective and straightforward manner.

To aid in identifying tie breakers or differences in the total number of games played, each club’s goal differential and any games in hand are displayed next to their names. As the season goes through its twists and turns, the Cann Table provides a great word picture of the true gaps between clubs that a table position-centric view might hide.

The graphic below is the EPL’s Cann Table through Match 33. Normally I wouldn’t produce my own chart, as The Football Project’s site often has a very good one.

However, they have not updated theirs since April 2nd, and when we’re this close to the end of the season having the latest data is key. So, I went over to Ian Eliowart’s site that updates automatically and borrowed his data. He not only has Cann tables for the Premier League, but also every division of association football on the British Isles.

Take a look at the table below, and tell me that it doesn’t better communicate the dominance of the clubs from Manchester, the large gap between the top six clubs and the rest of the league, the fact that Wolves are clearly bound for relegation, and that it’s really only a battle between four clubs to avoid the remaining two relegation spots. Combine the Cann Table with the publicly available forecasts over SportsClubStats.com or the Euro Club Index, and one has a very powerful set of tools with which to judge their club’s relative strength within a league.

A Statistical Look at the Race for European Qualification in the EPL

There are only eight matches left in the Premier League this season.

The clubs from Manchester turned the race for the EPL championship into a two-horse affair long ago, so outside of the fan bases for those two clubs much of the league has been focused on the race for table positions three through five.

Those are the coveted positions that gain a club entry to Europe, with the third and fourth positions guaranteeing a shot at Champions League glory next season and all of the money that comes with it.

There are five clubs competing for those three positions – Arsenal, Chelsea, Liverpool, Newcastle, and Tottenham. Only three will find a seat at the European competition table when the music stops in May.

Of late, Arsenal has been streaking. Tottenham and Liverpool have been slumping.

Newcastle and Chelsea have been plodding along. Sitting on the outside looking in at a Champions League spot was too much for Roman Abramovich to stand and thus Andres Villas-Boas got the sack a month ago. Liverpool has been slowly fading as of late.

All of these are general observations deserve a bit more of a numerical examination to understand exactly what’s been going on during one of the more volatile Premier League seasons in memory. Such an analysis cuts through the weekly “who’s getting sacked” rumor mill and the volatility of match-to-match data that fails to see the forest from the trees.

In taking such a numerical view of the league, three measures of team performance are examined below

Cumulative Points Per Match – Total points accumulated divided by the number of matches played. This metric gives an idea as to the pace a team is on for PPM at the end of the season.
Points Per Match (4 Match Running Average) – The average points accumulated per match over the previous four matches. This gives a running tally of the previous month’s worth of matches, and can help highlight or quantify long term declines or rises.

Table Position (4 Match Running Average) – The average table position per match over the previous four matches. This erases the volatility of weekly table position swaps and takes a bit of a longer term view of how a club’s previous month’s performance is translating to the their ultimate position in the table.
One gets an idea of how the season has played out when each of the metrics are plotted against match number. Graphs are presented individually below, with commentary attached to each and a forecast as to odds of how each team will finish the season.

Cumulative Points Per Match (PPM)

Cumulative Points Per Match (PPM)

    • From the graph above (click to enlarge), we can draw the following conclusions:
    • Three of the five clubs reached their PPM peak by match 10. This isn’t surprising, as a few wins to start the season can be a bit distorting to the PPM tally that settles out as sample size increases.
      What is interesting is that only one of these clubs – Newcastle – has been able to stop the inevitable decline and settle around a PPM of 1.6 +/- 0.1 points. If trend lines were drawn for Liverpool and Chelsea they would continue downward even at this late stage of the season.
    • Tottenham clearly had the best start to the season in terms of sustained form. They peaked by Week 13, and their peak PPM was only second to Chelsea.
      They too have seen a steady decline since then, but their long, sustained build of points should see them through to at least the fifth table position at season’s end.
    • Arsenal’s oft-chronicled slow start to the season is readily displayed in the graph. Even their mid-season slump can be viewed with the PPM drop from march 19 until match 23.
      However, taken in whole. it’s clear that starting at match 10 the Gunners have been cycling between a PPM of 1.5 and 2. That suggests they’ve been in the thick of European competition since match 10 given the current PPMs of the other four teams.
      The start of the season was certainly a crisis in confidence for the club, but looking back it might be suggested that the righting of the ship took place much earlier than we all thought.

Points Per Match (4 Match Running Average)

While the graph (click to enlarge) above is a bit… messy, it does provide some additional insights into the streaks we’ve seen throughout the season.

Here are some observations drawn from the graph:

On a monthly basis, Arsenal’s early season form wasn’t its nadir. Instead, it is found during a string of matches right after the mid-point in the season.

Losses to Fulham, Swansea, and Manchester United and a 0-0 draw against Bolton at the Emirates lead to a month long average PPM of 0.25.

Arsenal would begin their now-seven match winning streak in their next Premier League tie (a 7-1 win against Blackburn), although the beginnings of their improved league form were overshadowed by their 4-0 dismantling by AC Milan the San Siro two weeks after the Bolton draw.

It should now come as no surprise that the Gunners came up one goal short of a historic turnaround two weeks later at the Champions League return leg at the Emirates, as we now understand Arsenal has been in top form for the last seven league matches.

If one looks closely at the graph they can see just how precipitous Tottenham‘s decline has been (yellow line).

They reached their 4-match running average PPM peak of 3 points per match by the 13th match of the season. Since then, they’ve been on a steady decline for the next 17 matches, with only a brief spike around match 20 slowing the decline.

In fact, it’s remarkable how consistent the decline has been with an R-squared value of 0.724. Spurs have been losing a consistent 0.11 points per match from their 4-match running average since there December 3rd win against Bolton.
In an equal steep, but less consistent, decline is Liverpool.

There latest peak 4-match running average was later than Tottenham’s. Starting at Match 19 they’ve been on similar decline at an average clip of 0.09 points per match, although at a less consistent rate given an R-squared value of 0.60. The problem for Liverpool is that they started at a much lower peak value (2.0) and starting table position than Tottenham, and thus have fallen further in the table than Spurs. This performance has led many in the press to call for Kenny Dalglish’s sacking, with perhaps his role of reversing the rot at Liverpool complete.

The graph clearly demonstrates the volatility of Newcastle’s season. What it also demonstrates is the painful drop in form at about the one-third mark of the season. While their latest stumbles haven’t helped, that earlier season drop in form is what has cost them dearly in table position. It’s a position from which the club has not been able to recover.

What must be most frustrating to Abramovich has been Chelsea’s inconsistency throughout the season. The Blues dropped below the 1.0 point per match threshold – no better than a draw on average – three times during Andres Villas-Boas’ tenure.

The third time was the charm for Abramovich, sacking AVB only two matches after crossing this threshold for the third time. There’s been a modest rebound in their play since his sacking, but it’s a far cry from the level or performance required to squeeze into the fourth spot to get a Champions League play-in.

Table Position (4 Match Running Average)

The results of the previous two graphs are found in the graph above (click to enlarge) that looks at the 4-match running average of table position by club. One’s perspective must be reversed compared to previous graphs – a line being lower on the graph is better for the club because it indicates a lower numerical/high position in the table. The table also demonstrates the deceiving nature of the table throughout the season, and why looking at running averages of PPM may be an earlier indicator of trouble or opportunity for a club.

The effect of both Chelsea’s and Newcastle’s hot starts and slow declines on table position are clearly seen in the graph. Reflective of earlier comments regarding Newcastle’s stabilization on a PPM basis, they’ve been a consistent fifth or sixth in the table for the last seven matches.

It appears Liverpool’s never really been in serious contention for a Champions League spot, being no better than fifth the entire season on a 4-match running average basis. Early season success was fleeting, and they’ve been on a steady decline since then at a rate of 0.07 places match.

Tottenham and Arsenal both demonstrate that it doesn’t matter how a club starts the race, but rather how one finishes it. A 38 match season is a long affair, and as much as Tottenham started the season hot, Arsenal seems to be poised to finish it even hotter. How frustrating might it be for Tottenham and their supporters to see such a glorious start to the season possibly end in fourth to a club that didn’t make it halfway up the table until a quarter of the way through the season and into the top 25% of the table until the season was nearly half complete?

Here’s a statistic to put Arsenal’s steady climb into third into perspective. Since the Gunners crossed the 7th position barrier on 4-match running basis at match 12, they’ve moved up the table at a 0.13 position clip per match. That’s a table position every 8 matches. Slow, but steady, wins the day.

Outlook for the Rest of the Season

So how will the season turn out? I don’t have such prognostication skills (yet), but there are multiple options for forecasts via the web.

One of my favorites comes from Sports Club Stats. Their website not only provides the odds of a team winning the next match or the Premiership, but also the odds of finishing in any table position given their current point total and the opposition they face the rest of the season.

If one clicks on a team of interest they’ll find even more details, like the odds a club finishes in different table positions given their final point total, and the odds for each table position given the multiple win/draw/loss combinations that can add up to the corresponding final point total. It’s quite a neat site.

Given that the site is updated at the conclusion of each match weekend I have grabbed this week’s data and compiled it in the table below. Each club’s odds of finishing in 3rd through 7th or outside of 7th are shown within the table.

A few conclusions can be drawn from the table:
  • Arsenal appears poised to maintain their streak of Champions League qualifications, making it into the competition for the 15th straight year.  With an 8 point lead over Chelsea with only 8 matches to go, Chelsea would have to earn more than a point per match more than Arsenal the rest of the way.  That is something they’ve only done three times so far this season (on a 4-match running average basis).


  • Tottenham are close behind Arsenal, with a less than 15% chance of not making it into next year’s competition.  There be a small bit of a crisis of confidence if the late-season swoon gives Arsenal the third position and Tottenham must play-in to the tournament, but one would suspect it’s something the players could get over and secure England a fourth entrant into the group play stage of next season’s Champions League group play stage.


  • Barring further decline by Tottenham, Chelsea will be on the outside looking in when it comes to Champions League qualification.  They only have a 15% chance of making it into the tournament, and a nearly 30% chance of not even qualifying for the Europa League.  Will Abramovich’s club take UEFA’s second tier competition seriously next year, or choose instead to focus their efforts on the league to secure a Champion’s League berth for 2013-24?  This off season will be very odd at Stamford Bridge – perhaps no trophies, no Champions League next season, and a club that anyone would agree is in decline in the two short years.


  • Newcastle United will be able to hold their head high, likely finishing 6th or better only a season after finishing 12th.  Things will be looking up at St. James Park in the off season, having worked their way steadily up from promotion to the Premier League only two seasons ago.  They may even get lucky, with a 31% chance of pipping the likes of Chelsea for 5th position or higher and a spot in Europa League.


  • Things at Anfield are not good.  The Reds have a greater-than-even odds of not even finishing in the Top 7, a step backwards from last season on a table basis and the performance in 2009-10 that got Rafael Benitez sacked.  In fact, it would be their worst finish in terms of table position since 1993/94.  There is no doubting the turnaround Kenny Dalglish led at Liverpool when they had reached the nadir of what was effectively the Hicks/Gillette regime.  However, one can’t help but the feel the magic has worn off and the honeymoon is over when a comparison is made between this season’s performance and the one in the second half of last season.  There are few excuses to be had.  Liverpool is under performing, and one must wonder how long King Kenny will keep his job.


No matter what the odds say, the last eight matches of the season will provide for some compelling soccer.  Will Manchester City win their first ever Premier League title or will United fend off yet another challenge to their long-term supremacy within the league?
Which of the two bitter North London rivals will gain automatic entry to the Premier League?  Will Chelsea be able to slip into Champions League qualification, or might they even fall out of the Europa League qualifying spot?
 Might they even pull off the unlikely feat of winning the Champions League, thus fulfilling one of Abramovich’s long held goals and denying the fourth table position entry into the competition?  Will Liverpool stumble to their worst finish since the 1993/94 season?
Only the final eight matches will tell.  It’s going to be an exciting end to the season!

Reactions to MLS Semifinals, Conference Final Odds, and an Update on Semifinal Model

Conference Semifinal Reactions

MLS’s annual bastardization of soccer playoffs – aka the conference semifinals – is now complete. Sure, I’m a little bitter because my team dug a hole in its first leg that it couldn’t climb out of even with an outstanding performance.

I was at that second leg this past Wednesday, and the energy was electric until the final whistle. It’s more that this league can’t seem to figure out what it really wants to be – it wants to cater to the American sports fan via a playoff format, but then in a nod to every other knockout format by utilizing two-legged semifinals while not even implementing the away-goal rule.

MLS would be better off picking one direction or the other and sticking to it.

Nonetheless, the Sounders and three other teams are out of the playoffs now, and we’re down to the final four teams fighting for a spot in MLS Cup 2011 in LA. The format is what it is, so it’s time to see how I did against it. I went 2-for-4 in my conference semifinal picks, with varying reasons for success and failure.

I got the LA and Kansas City wins correct.


In LA, I correctly bet they were too good to go down due to the six match goal differential they had to the Red Bulls. In Kansas City, I correctly bet they would hold serve on match differential and were simply too hot to not win. Clearly, their 4-0 drubbing of Colorado over two matches demonstrated that superior form.

Honestly, the Philadelphia/Houston series was a toss up from a statistical prediction standpoint. It was the closest of the four using my statistical methods, but any statistical advantage for Philly came in that their coach had less experience than Houston’s (they were even on matches played).

Luckily, this year’s results got rid of that silly “coach experience” anomaly as a statistically significant predictor (more on the adjustments to the model later). The matchup was really just a flip of a coin statistically, and perhaps I should have gone with the experience of Houston over the second-year improvement and first playoff birth for the Union.

In the Seattle/Real Salt Lake series I picked against my statistical judgement, giving in to supporter’s optimism. In the closing weeks of the regular season I told any Sounders supporter I knew that I would rather the Sounders have faced FC Dallas in the first round than Real Salt Lake.

RSL’s skid at the end of the season was a false one – one predicated upon missing personnel they were getting back by playoff time.

FC Dallas, on the other hand, was clearly a slumping team that continued to slump in the playoffs. The Sounders would have matched up far better against FC Dallas, would have likely been playing to finally get the LA monkey off their back in the Conference Final, and Real Salt Lake would have been tearing up the Eastern Conference Playoffs and be in that conference’s final right now. They’d likely have won the East, and we’d be staring at an RSL vs. Sounders/Galaxy final in several weeks.

For all the griping that would have come from a “Western Conference team winning the East”, it would have been a just end to a season that saw those three teams dominate the Western Conference and largely the entire league. Ironically, one of the few just endings from the MLS playoffs in recent memory.

Rarely do things work out as desired, and Seattle faced RSL in the conference semifinals. As a supporter, I picked against the statistics, the Sounders’ history of troubles in the playoffs (they had to end sometime, right?), and Real Salt Lake’s playoff experience.

I felt the Sounders and Galaxy would both overcome the statistics, and perhaps we’d be able to say the league had gotten to the point that its playoff format didn’t determine champions based upon who had played fewer matches in a season.

Watching the first leg from the couch of my living room, I immediately regretted the pick (side note: luckily an 8-hour exam earlier in the day and three beers throughout the match luckily made me too tired to throw anything at the television, or else I’d be out a couple grand right now due to buying a new television).

The Sounders picked the worst day of the year to play what was their worst game of the year, resulting in a 3-0 deficit for them.

The return leg was the polar opposite. It was very clear that RSL was intent on parking the bus and earning a berth in the Western Conference Finals based purely upon the three goals they scored in the first leg. The statistics in the table below, which compares the change in different statistics from games one to two for each of the clubs leading after the first leg in the 2011 conference semifinals, bear this out.

Granted, the other three teams were heading home to defend their leads, none of them was as large as Real Salt Lake’s, and none of their first leg performances had been as dominant as Real Salt Lake‘s.

RSL said all the right things going in to the second leg in Seattle, recognizing the Sounders were a dangerous team – they had won eight matches during the regular season by scoring three or more goals, six of those wins were by two or more goals, and two of them were 3-0 shutouts. Still, watching the game live, re-watching highlights, and then looking at the statistics above I can’t help but feel RSL went beyond parking the bus. Time wasting got so bad that Nick Rimando was issued a yellow card for just such an infraction.

RSL simply hunkered down and was content to boot the ball forward. The starkest contrast could be drawn with Sporting Kansas City, who went home up 2-0 and came out with attacks in the second leg that netted another 2-0 result for them. RSL was the only team of the four to move on to the second leg and have a worse performance across the board.

Nonetheless, the Sounders fell short of their attempt to come back from a three goal deficit. What will likely haunt them the entire offseason is not the misses or blocks in the second leg – there’s not much they can do about a Real Salt Lake defense that played relatively well against the 26 shots they faced. It will be the Grabavoy goal in the dying minutes of the first leg that ended up giving RSL their three goal lead going back to Seattle.

None of this is to say that RSL doesn’t deserve the win. They played outstanding, attacking football in the first leg, and combined with the Sounders horrible performance they earned their three goal lead. The shame is that they didn’t pursue the single goal in Seattle that would clearly put them through to the final, and instead played cynical, time wasting, park-the-bus soccer that helps fuel criticism of MLS’s two-legged format.

Update to the Conference Semifinal Models

With the conclusion of this year’s conference semifinals, eight new data points were added to the model that is based upon MLS playoff data from 2003 forward. Those new data points have helped to make the model a little more logical, as well as confirm one of the early trends.

On the logic front, losses by Philadelphia and New York, who had some of the shortest tenured managers in the playoffs, eliminated the odd historical anomaly of less experienced managers fairing better in the conference semifinals from the ranks of statistically significant predictors.

Replacing it in the list of significant predictors was the difference in the teams’ seeds. A plot of the effects of seed difference are shown in the graph below. Seeds are listed numerically, so top seed LA (1) playing bottom Western Conference seed New York (6) would produce a seed difference of -5 for LA and +5 for New York.

Based upon the graph and its associated equation, each unit difference in seed changes the odds of winning a two-legged playoff by 6.8%.

Despite the LA Galaxy becoming the first team to win a two-legged conference semifinal when facing a team that had played 6+ fewer games than them, the trend of teams playing more games losing their two-legged playoff continued. Two of the teams that lost – Seattle and Colorado – each played four and three games more, respectively, than their opponents. The net impact of the 2011 results is expressed via the graph below.

Astute readers who compare the exponent term in the equation to the same term from the 2003-2010 data will see that it is numerically smaller. The net effect is to lower the impact of the difference in matches being played: a 6.9% change in odds of winning the series for each unit change in game differential compared to a 7.5% change excluding the 2011 playoff data. The addition of the extra data points also tightens up the 95th percentile bounds. Data through 2010 indicated a 95th percentile range of .34 around the nominal (solid) line between game differences of -5 and +5. The increased sample size and results from the 2011 data have now tightened this range to 0.29. In statistical speak, the accuracy of the model’s nominal prediction continues to increase, while the effect of increased matches seems to be a bit lower than originally predicted.

A Brief Prediction of the Conference Finals

Going in to the conference finals, the playoffs switch back to a single match, winner-take-all format at the higher seed’s home pitch. As was shown in my earlier post on the history of MLS single-match playoffs since 2003, the only statistically significant predictor of success is the difference in the team’s two goal differentials throughout the season (including playoffs). The table below provides a comparison of the conference finalists‘ goal differentials and their odds of winning.

I’ll be sticking with the numbers. In the case of Kansas City, I think they’re simply too hot to lose this match at home. A rough start to the season on the road has been rewarded with a second half of season homestand and outstanding play to go with it. I agree with Grant Wahl when it comes to LA – their season may go down as the single greatest in MLS history if they’re able to to win the MLS Cup. The match with RSL will be close, but in the end I think they will prevail. I just think LA is too good to not win at home in the conference finals, and then win again at home two weeks later to hoist MLS Cup 2011.…

Soccernomics Was Wrong: Why Transfer Expenditures Matter, and How They Can Predict Table Position

Note: This is a re-post from analysis I did back in December 2010 for The Tomkins Times.  I am posting it here to complete my series of posts on squad transfer costs, and to set up a forthcoming series of posts on the impact of starting XI transfer costs on table position.

“In fact, the amount that almost any club spends on transfer fees bears little relation to where it finishes in the league. We studied the spending of forty English clubs between 1978 and 1997, and found that their outlay on transfers explained only 16 percent of their total variation in league position. By contrast, their spending on salaries explained a massive 92 percent of that variation. In the 1998-2007 period, spending on salaries by clubs in the Premier League and the Championship… still explained 89 percent of the variation in league position. It seems that high wages help a club much more than do spectacular transfers.”

So begins Chapter 3 of the wonderful book Soccernomics, where authors Simon Kuper and Stefan Szymanski use the above analysis to launch into an explanation of:

  • Why the transfer market is inefficient.
  • The unique approach Brian Clough took to building his Nottingham Forest teams through good bargains in the transfer market.
  • How most clubs spend little money helping such prized individuals adapt to their new team and culture.How Olympique Lyon make money buying low and selling high.

Each of these examples of individual success and failure in the transfer market makes for a compelling case. However, suppose that’s what they were – good examples of individual successes and failures. What if the authors were wrong in their initial analysis, and that on average spending more in the transfer market is a key enabler of league success?

I loved Soccernomics, and thought it was full of many thought-provoking analyses. I loved it so much that it has spurred my exploration of soccer statistics and fueled the material on my own blog. But no matter how much I liked the book the authors’ claim at the outset of Chapter 3 never sat right with me. It didn’t make sense to me after seeing the performance of Chelsea and Manchester United over the last half decade, but I never had the data to prove it. Luckily, the Transfer Price Index provides such data, and my analysis of the data suggests that large expenditures in the transfer market are a pre-requisite to building a team that can consistently compete for the Premier League title.

Do Wages or Transfer Expenditures Help Predict Table Position?

One of the reasons that the Soccernomics analysis never sounded exactly correct was the qualifier they gave to their transfer expenditure analysis:

“In short, the more you pay your players in wages, the higher you will finish; but what you pay for them in transfer fees doesn’t seem to make much difference.”

Combined with the opening quote, I suspect the authors looked at what each team spent on transfers in a year, attempted to correlate the expenditures to the next season’s performance, and found little correlation. That would make sense, as the few players a team brings in over a single year may not be able to have that big of an impact on a squad of eleven. That’s even assuming each transfer moves immediately into the match day squad, which isn’t often the case.

That exact thought – who plays on the pitch most of the time: transfers or home grown players? – was answered via the data assembled for Pay As You Play. The authors assembled data on the average number of homegrown players in each game for each team over each season, and I have plotted that relationship below for each of the eighteen Premier League seasons. For comparison I have also plotted the same data for the Big Four clubs on a second axis on the right side of the graph (click on graph to enlarge).

The data shows that the Premier League averaged only 2.6 homegrown players per match (24% of the players on the pitch) in its inaugural season. Since then, it has been on a steady erosion of about a tenth of a player per game per season to the point of being under a player per game (8% of players on the pitch) by the 2009-2010 season. By comparison, the average percentage of a squad composition of youth players bounced between 15% and 20% the last ten seasons, meaning that homegrown players are getting very few shots at playing time. In fact, the difference is considered “extremely statistically significant” when the proper statistical tests are performed, which is a rarity in the sports statistics world.

The Big Four have been on similar declines since the beginning of the Premier League, although they seemed to have essentially bottomed out since season nine (Manchester’s inevitable decline after unusual homegrown success is the one exception). Transfers must play a key roll in the team’s success if anywhere from 8.5 to 10 players on any side of a match are not homegrown.

Pay As You Play also provides the other key data set in helping determine if wages or transfer expenditures help predict league success. Its current transfer purchase price (CTPP©) database provides a way to compare the cost to assemble the squad versus the Soccernomics wage data, and the conclusions are interesting. For this analysis, I will be using the CTTP’s Sq£, which denotes the total costs of transfers within the squad, inflated to current values using TPI.

Some might question why a squad metric is used instead of a utilization metric, like £XI (the average cost of the XI over the course of a season, with inflation taken into account). The reason is twofold. The first is that the data must be viewed in the order of events as they actually occur, and not how one might view it in hindsight. A transfer must take place before a player and team can negotiate wages and before they can play a game for the new team. Thus, if a relationship does exist between squad transfer cost and performance, it would be the more important predictor of future success than a later event that is dependent upon the transfer occurring in the first place. The second reason is that because a measurement like £XI is dependent upon a player’s utilization, it is not effective at predicting pre-season performance and setting realistic expectations. The £XI may be very good at understanding why a team is under- or over performing once a reasonable amount of play has transpired, but not necessarily in judging how team’s transfer expenditures will contribute to future success.

There’s also a reason to look at a model based on transfer fees rather than wages – transparency. The world of soccer finance is murky any way you cut it, but it gets murkier once the financial transactions are contained within a single team. In conversations related to this post Graeme Riley explained his philosophy regarding transfers and wages, which is a common one:

“[W]ages show how a one-sided relationship values a player and so is less representative than transfers. Firstly the details are likely to be confidential and therefore less easily identified. Secondly the wages can be varied almost by the day (e.g. play bonus, win bonus, …there even used to be share of attendance bonus!), whereas the transfer price is “relatively” fixed (even allowing for appearance add-ons etc).”

If the quality of the data is variable, the outcome of the model is less trustworthy. We have no idea the quality of the data used for the Soccernomics model, but in general wages are a murky matter. The CTPP database is clearly constructed, attributed, and transparent and the quality of the data is superb.*

A little background must be provided before diving into the analysis. In their study, the authors of Soccernomics compared average league finishing position to the average of each club’s wage expenditure relative to the league average wage expenditure. To complete a comparison to the CTPP data, a similar metric was created that looked at the Sq£ data for each club versus that season’s average Sq£ value. This figure is denoted by MSq£ for “multiple of average Sq£”. Thus, the metric is not measuring how much a squad costs, but how much more (or less) it costs versus the average squad that season. This corresponds with the finish position against which variable wages and costs were compared. Finish position is only measuring how well one team performed against their competition, and is not an absolute measure like points.

In addition to creating the wage and table position data, the authors of Soccernomics had to transform the data sets using a natural logarithm to satisfy the pre-requisites for regression analysis. I won’t bore the casual reader with any more details on this process, but more statistically inclined readers can see this blog post for more detail. I provide this bit of background only to speak to the power of the CTPP data later in this post.

Finally, the CTPP had to be isolated to the years 1997-2008 given that the Soccernomicsdata was only plotted over a similar time period. Given that the Soccernomics data contains Championship and Premier League data while the CTPP only contains Premier League data, the CTPP was further trimmed to clubs that had missed only two seasons or less of Premier League play during that time period. This ensured the effects of budget cuts due to relegation or large transfer outlays due to recent promotion would be minimized yet keep the sample size large enough. Ultimately, that left thirteen clubs for the wage data vs. CTPP analysis – Arsenal, Aston Villa, Blackburn, Charlton, Chelsea, Everton, Liverpool, Manchester United, Middlesbrough, Newcastle, Southampton, Tottenham Hotspur, and West Ham United. A plot of the data is shown below (click on graph to enlarge).

Clearly there is a strong relationship between the current wages of a squad and the current cost in transfer fees paid to assemble it – 94% of the relationship is explained by the regression model. This is intuitive, but until the CTPP database we didn’t have the data to prove it. Perhaps the authors of Soccernomics weren’t demonstrating a relationship between wages and finish position, but rather confounding it with the actual relationship between the MSq£ and finish position. Combined with the youth player data, it would appear there is enough evidence to indicate transfers costs are key to assembling a team. Now the relationship between MSq£ and finishing position can be explored.

The Effect of MSq£ on Finishing Position

Given that it seems wages and MSq£ are highly correlated, a study of MSq£ vs. table position was undertaken. Data from all eighteen seasons of the Premier League was used for the analysis. Interestingly, unlike the Soccernomics data sets, both the table position data and the MSq£ data satisfied the requirements for regression analysis without the need for transformations. Standard statistical tests indicate the data is undoubtedly correlated, and the need to not transform the data provides a much more direct equation for explaining the relationship between the two. A plot of the regression study’s analysis is shown below (click on graph to enlarge).

The regression plot demonstrates that nearly 70% of the variability (quite a good value given the sample size) between finish position and squad cost is explained by the relationship:

Average Finish Position = -7.2221*(MSq£) + 18.32

Points that fall below the line show that, on average, a team has outperformed the model and finishes better than their average MSq£ would indicate. Teams above the line fair worse than projected. The implications of the equation are:

  • Teams that are built with a league average Sq£ (MSq£ = 1.0) have typically finished in 11th place.
  • If a club wants a good chance staying away from relegation, they typically need to have a Sq£ of at least 20% of the average Sq£ for that season.
  • If a club wants a good chance at a Champions League spot, they typically need to have a Sq£ of at least 1.98 times the average Sq£ for that season.
  • To finish fifth and qualify automatically for the Europa League, a club typically need to have a Sq£ of at least 1.85 times the average Sq£ for that season.

Spending money certainly doesn’t mean success, and single seasons may present under- or over-performance versus the historical average. Part of that may have to do with how much of the squad’s cost makes it onto the field of play, but one must undoubtedly spend the money in the first place to have a shot at getting them on the field. The regression analysis above should leave no doubt that not only does it pay to spend, it pays to spend big relative to your competition.

Looking at teams that spent the league average or more over time leads to some interesting observations. The image below focuses on those clubs.

The following observations can be made:

  • Only twelve teams out of forty-four in the history of the Premier League have averaged an MSq£ greater than 1.0.
  • All seven of the teams that have never been relegated from the Premier League – Everton, Aston Villa, Tottenham Hotspur, Liverpool, Arsenal, Chelsea, and Manchester United – have an average MSq£ of 1.0 or better. Five of the seven have an average MSq£ of 1.3 or better.
  • Aston Villa and Arsenal are the biggest overachievers, as represented by each of them having the biggest gap to the lower side of the regression line. Each has performed about six places better than their MSq£ would suggest.
  • Chelsea and Newcastle are the biggest underachievers. Chelsea suffers from a lower average finish due their performance in the league’s first decade and their consequent spend explosion in spending the second half.

There is also one common denominator of the top five spenders: DEBT. Much has been made of the Big Four’s debt woes via UEFA’s own reports and resultant fair play rules. I’ve done my own analysis using the annual Forbes rankings, using their 2006 through 2010 data to look at revenue-to-debt and profit margins before taxes for the Big Four (Newcastle have their own debt problems) to understand their ability to manage such debt. Each of them has different challenges before them:

  • While Arsenal has a healthy profit margin that has grown over each of the last four years, they carry the heaviest revenue-to-debt burden due to the recent construction of Emirates Stadium. Good debt indeed, but debt that must be serviced nonetheless.
  • Chelsea, through a forgiveness of debt by Roman Abramovich, has the best revenue-to-debt ratio of the four. However, they have yet to show a profit since 2006 and will be challenged by the fair play rules.
  • Liverpool may be the most challenged of the four. Their revenue-to-debt ratio and profit margins have been heading in the wrong direction since 2006. NESV’s purchase and effective dismissal of debt will undoubtedly help, but the ownership group’s cautious approach and the continued need for a new stadium will weigh heavily on the team’s ability to increase their MSq£.
  • Manchester United is a mixed bag like Arsenal, although likely not in as good a position. The Glazer debt is suffocating, providing them with the lowest revenue-to-debt ratio of the four even though they outstrip the next closest club’s revenue (Arsenal) by nearly 25%. However, they are the most profitable club at a 30% margin (before taxes).

All of this suggests that the Big Four, in attempting to maintain their dominance, have embarked on an unsustainable path. Each has taken different paths towards large debt loads – whether it is in players, stadiums, or overseas marketing. Whatever they have spent their (or others’) money on, it appears that such spending and the associated annual placement in the top four table positions is unsustainable given the debt load they carry today. Perhaps what we have witnessed over the last decade will be viewed years hence as not the natural order of things, but an aberration where funny money ruled the decade and led to the long term fiscal sickness of several clubs.

Indeed, the financial dominance of the Big Four has waned since its peak mid-decade. The plot below shows the MSq£ in the post-Abramovich era for the Big Four plus Tottenham and Manchester City (click on graph to enlarge).

By 2006 Tottenham had passed their rivals Arsenal in MSq£, while that year also represented the peak of Chelsea’s MSq£ advantage. Since then, Tottenham has steadied themselves around an MSq£ of 1.7 while Manchester City has increased their squad cost to the second highest MSq£ in the 2010-2011 season. Aston Villa’s sixth place finish last season notwithstanding, these are the six teams that battled over the four Champions League spots. What was a domination of four teams in 2003-2004 (no one was closer to them than Tottenham’s 57% of Liverpool’s MSq£) is now a six team race with two of the former Big Four relegated to the 5th and 6th positions. This is just further evidence that perhaps a decade or so of dominance by four teams is likely at an end, and also means risky bets of debt-loaded operations that count on continual Champions League income are not such a safe bet anymore.

The Usefulness of the MSq£ Regression Equation: A Case Study of Liverpool FC

In Pay as You Play, the authors pay close attention to each team’s rank in £XI and their associated finish, using the metric to understand the variability in pay-for-performance from season to season. With the creation of the MSq£ regression equation there is now an explicit numeric relationship between the relative cost to assemble a squad and their likely performance. Combining the two approaches allows us to understand whether a team or a manager under- or over performed versus the cost of their squad.

There are two ways to determine if a team has over- or underperformed versus expectations:

  • How they have finished versus their MSq£ rank. If the MSq£ rank is numerically higher than the table finish, they have overperformed. If the MSq£ rank is numerically lower than table finish, they have underperformed. The MSq£ rank will be the same as Pay as You Play’s Sq£ rank.
  • Translating their MSq£ value to a predicted finish, and comparing that predicted finish to the actual table finish. If the predicted finish is less than the actual table finish, the team has over performed. If the predicted finish is greater than the actual table finish, the team has underperformed.

The added benefit of using the regression equation is that it shows what teams with similar expenditures have achieved in the past. If several teams end up spending a similar MSq£, a close cluster of predicted finishes will be predicted and we will get a much clearer perspective of which teams have over- and underperformed than a traditional ranking of expenditures. Applying both metrics also gives us the ability to make a better determination of the team’s performance versus its expenditures. If both the rank and predicted place metrics break the same way, a more definite declaration that the team has exceeded or failed to meet expectations can be made. If a discrepancy exists between the two methods, a push is declared (also known as a tie to the non-gambling reader).

The first table below shows how Liverpool’s Premier League managers have fared against the rank and regression metrics. The “Total” column contains the average MSq£ of each manager, followed by the average number of teams that had a squad more costly then them. The fourth column of data shows how the manager’s average finish compared to the regression prediction from their average MSq£ – a negative score indicates better-than-predicted placement (over-performance), while a positive score indicates less-than-predicted placement (under-performance). The fifth column is self explanatory, while the final column combines the regression and rank performance to an overall judgment on the manager’s performance.

The second table displays the total count of season-by-season manager performance versus both metrics (click on tables to enlarge).

As was pointed out in Pay as You Play, Graeme Souness’ record at Liverpool was one of underachievement versus the financial resources expended. He had a MSq£ well into the twos for the one full season he was in the Premier League, while only being able to pull a sixth place finish in the table. His replacement, Roy Evans, had mixed results. He did well versus the regression predictions, but on average only a single team had a higher Sq£ only one team on average throughout his career at Liverpool. The strain of underachievement of the squad led him to quit the partnership with Gerard Houllier during the 1998-1999 season.

What becomes clear is that Gerard Houllier’s years seem to be the only managerial term where the team consistently outperformed expenditures. Houllier’s term also coincides with Liverpool’s Premier League era peak for youth players – see years six (’98-99) through eight (’00-’01) in the youth player chart earlier in this post. At that point Liverpool were running nearly double the league average with almost four homegrown players per match. Houllier leveraged players like Jamie Carragher, Steven Gerrard, Robbie Fowler, Michael Owen, David Thompson, Dominic Matteo and Steve McManaman to outperform the MSq£ regression model (although some would point out Houllier inherited all of the homegrown talent). The later years of Houlier’s term represented a movement in the wrong direction both in terms of youth players and MSq£ – while still over performing versus expenditures, the club’s backwards slide in the table was not satisfying ownership or supporters’ expectations. Enter Rafael Benítez.

Rafa Benítez’s record is mixed. Overall, it’s a push with three seasons of over-performance, two as pushes, and one under-performance. The under-performance came in the first season, but the two pushes came in Rafa’s final three seasons with the club. Benítez didn’t inherit as many quality homegrown players and continued the steady downward trend in this metric, relying mainly on Carragher and Gerrard. This meant more of his team would be built on transfers, making success more challenging given Liverpool’s modest resources versus the competition (especially after a leveraged buyout).

His best over-performance was clearly the 2008-2009 campaign where Liverpool finished with 86 points. That year’s MSq£ was fourth highest, while the regression equation would have predicted a finish position of 6.62. Sadly, poor performance and low team morale resulted in the predicted seventh place finish in 2010. Rafa, who averaged 7 points a season more than Houllier (and who did far better in Europe) left soon afterward. [The analysis in Pay As You Play clearly shows how much better Benítez’s spending was in comparison with Houllier, particularly in terms of how their respective signings increased in value.]

Overall, Liverpool’s years in the Premier League have been a push. They have underperformed versus the MSq£ rank, but outperformed the regression equation. Until the ’06-’07 season they were also had the second highest utilization of youth players within the Big Four, nearly double Chelsea and Arsenal. These points are key, as history establishes realistic expectations going forward. While Liverpool has ranked high in MSq£ rank, they have consistently been number four within the Big Four.

They also seem to have occupied an interesting position in the Big Four. Chelsea has spent absurd amounts of money to compensate for the manager carousel they’ve experienced. Manchester United has been able to combine both high expenditures and management stability to set the standard for championships in the Premier League. Arsenal has relied on the genius of Arsene Wenger to keep them competitive with a modest MSq£. Liverpool seems to have had the worst of both worlds – a high turnover in managers and a very modest MSq£ compared to the big spenders they were chasing.

Liverpool’s MSq£ has steadily fallen by about 0.1 each season since 2003-2004, and is now the second lowest of the top six in the league (Arsenal is the only team with a lower MSq£). Liverpool has regressed to an MSq£ of 1.3 for the 2010-2011 season, leading to a predicted finish of 8.64. In the near term, Liverpool looks to be an upper mid-table club if they can get the right management and spend modest money. Longer term, they face a rebuilding task that needs a vision, a budget, and a manager to execute it.

The 2010-2011 Season So Far

So what does this all mean for this season?

The chart below summarizes each team’s performance to date versus their rank of MSq£ and the regression equation’s predicted finish. Chelsea’s, Manchester United’s, and Manchester City’s predicted finish from the regression equation had to be clipped to 1.0 as their MSq£ for 2010-2011 was so high that it lead to projected finishes of less than zero. Negative values versus the regression indicate over-performance, while positive values indicate under-performance (click on table to enlarge).

Clearly, the two biggest over performers are Bolton and West Bromwich Albion – both of which are placing nearly nine spots higher than the regression would predict and 10 spots higher than their place in the MSq£ rankings. Arsenal, Blackburn, and Blackpool also deserve special mention – each is at least five places higher than both the regression analysis and MSq£ rankings would indicate.

Chelsea and Manchester City are penalized due to their large spend (ranking 1-2 in MSq£), while dropping points and expected table position. Nothing short of a top finish for either will match the expectations set by their expenditures. Spurs and Manchester United are right where they should be. All of this makes for a congested top six in the table, where at least two of the current Champions League participants have a real chance of not being able to find a seat when the music stops playing at the end of the season.

At the bottom end of the table, perennial Premier League members Aston Villa are disappointing their management given the cash they’ve outlayed for them. They are 10 spots below their MSq£ rank and more than four positions below their regression equation prediction. The biggest underperformer of all is West Ham United, whose mid-table MSq£ outlay has resulted in a disappointing run at the bottom and six places lower than the regression equation predicts. Fulham and Wigan are punching five spots below their MSq£ rank, but only two to three spots below what the regression equation predicts.

It’s a long season, and a lot can change between now and May 2011. As Graeme Riley has pointed out, this season has been far less predictable than those past. Perhaps we’re witnessing the beginning of a new age when money matters less, or maybe it’s just one where the disparity in squad cost, and resultant performance, is far less. Either way, it may leave some big spenders disappointed, some frugal clubs pleasantly surprised, and others just happy to not be relegated.


The quote at the outset of this post noted that the Soccernomics wage model accounted for 89% of the variation between wages and finish position, while the MSq£ model accounts for nearly 70% of the variation between MSq£ and finish position. A stronger relationship to wages makes sense. Players’ contracts can be renegotiated or extended to account for improvement or degradation in play since they initially arrived, while the CTPP data used to generate the MSq£ data is a static value that only changes based on overall transfer market conditions and not an individual player’s performance after the transfer. Nonetheless, a transfer must take place before anyone can negotiate wages or play a game for the new team and begin to generate data for “relative contribution” metrics. Paying for transfers is a pre-requisite for getting the talent a team hopes contributes to superior finishes on match day. Combine this with the uncertainty in obtaining reliable wage data versus more public transactions in the transfer market, and a compelling case can be made to look at transfers first and conclude they are the price-of-entry to having a shot at Premier League success. Once a player has been purchased, wages or utilization metrics are better suited to diagnosing actual performance versus expectations.

Understanding who’s spending money on transfers and how much more they are spending than the other teams in the league is critical to understanding their ability to compete for top finishing positions. At any moment in the 2010-2011 season, the average Premier League team is fielding a squad of ten transfers and one home grown player. The quality of those transfers as indicated by their current transfer purchase price and the team’s likely finish position seem to be highly correlated.

To understand a team’s relative expenditures is to begin to understand their potential table position. Doing so helps set realistic expectations for the squad, the team’s management, and its supporters. Ignoring this reality can lead to unrealistic expectations which in the end create a desire for quick solutions that can cause more organization and financial turmoil, setting the team further back from its goals for table finish.

*[Since the original publication of this blog entry I have been contacted by Soccernomicsauthor Stefan Szymanski and this is what he had to say about the wage data used within Soccernomics:

“You question the quality of the wage data but I’m not sure that’s right- this is audited data from the company accounts published annually – not a guess like you see in Forbes. Its one weakness is that it is total payroll data, not just players- but players account for 90% plus of payroll normally. It must be much better quality than transfer fee data which is not audited and represents figures mentioned in the newspapers- the clubs never reveal the actual transaction value, and I’m told there are a lot of inaccuracies. Without getting confirmation directly from the clubs, there is no way to check this.”

Indeed, it appears the wage data used in Soccernomics is of the highest quality. I retract my earlier comment questioning its quality. At the same time, I would stand by the CTPP database being the most accurate of its kind for transfers. Stefan was quite complimentary of the overall post and its predecessor deconstructing his work at my blog, for which I am very grateful. Ultimately, he and I would agree on the wage data being a better predictor given its higher R-squared value for the same reasons I gave at the conclusion of my post. I hope that Paul and I can engage Stefan in future analysis of the CTPP database and continue to shed light on the impact of finances on the result on the pitch.]…

That’s Why They Play the Games


The Rematch is almost upon us, and it’s only appropriate that I turn to the Soccernomics model for national team performance to provide a prediction of the possible outcome. If two teams, i and j, face each other the Soccernomics model uses the following equation to predict goal differential:
GD(ij) = 0.137 ln[pop(i)/pop(j))] + 0.145 ln[GDP(i/GDP(j))] + 0.739 ln[exp(i)/exp(j)] + 0.657 for home team
In the case of England and the US, no one is the home team so we can drop the final term in the equation. Now we turn to the population, GDP, and experience terms.
  • England51,456,400
  • US309,449,000
  • England: $35,334
  • United States: $46,381
  • England: 780 matches
  • United States: 403 matches
Plugging these variables into the above equation yields a 0.203 goal differential advantage for England. England greatly benefits from their nearly 2-to-1 advantage in international experience that counts nearly six times as much as the other variables. For all the smack talk from English fans, this is hardly an advantage in the World Cup.
This match will likely be far more even than some would think. It’s certainly within the United State’s capability to win it. Both teams are in top form, and everyone is hoping for an outstanding clash worth the 60 year wait.
I have only one thing left to say.