Monday, November 28, 2011

Quantifying Manchester City's Start to the Season

We're now thirteen games into the 2011/12 English Premier League season, which means we're just over one third of the way towards crowning the Premier League champion.  So far the season has looked like it's Manchester City's championship to lose, with the remaining teams left to battle for one of the three remaining Champions League spots.  City's start to the season is certainly hot - 35 points from an available 39 (11-2-0 for a 2.69 PPM) and a +31 GD (2.38 GD per match) - and is tops at the 13 match mark in the league's 20 team/38 match history by besting the 05/06 Chelsea and 06/07 Manchester United totals for points (34 for a 2.62 PPM) and the 08/09 Chelsea total for GD (28 for a GD per match of 2.15).  However, as that 08/09 Chelsea squad can attest, starting hot does not guarantee finishing at the top.  In fact, of the 16 teams that led the league after thirteen matches during the league's 20 team era, only 6 of them have gone on to finish in that position at the end of the season.  Five dropped to second, three dropped to third, one (2002/03 Liverpool) dropped to fifth, and one (1998/99 Aston Villa) dropped to sixth by season's end.

So just how hot has been Manchester's City's start when considering the variation in teams' performance that sit at the top of the table after match day 13, what does it mean as to the odds of them maintaining their torrid pace for points and goal differential the rest of the year, and what does it mean for their chances of finishing top of the table come May?

Quantifying City's Hot Start 

Combing through's historical archive of tables by match day, a comprehensive list of tables for match day 13 were compiled for the 1995/96 through 2010/11 seasons.  The tables were then isolated for the teams at the top of the table for each season after the 13th match day.  Luckily, the results for goals per match and points per match for these top-of-the-table clubs produce a normal distribution of data for each statistic, so a Z-score approach can be taken to quantify just how rare City's start is to this season.

Looking at goal differential per game, the next closest team to City's 2.38 GD per match after 13 matches is the 2008/09 Chelsea average of 2.15.  The average GD of clubs at the top of the table after 13 matches is 1.45, while the standard deviation is 0.4362.  This means that Manchester City's start correlates to a Z-score of 2.17, while Chelsea's 2008/09 start translates to a Z-score of 1.64.  Those Z-scores can then be translated to percentiles, which communicate what percentage of teams in first place after 13 matches would finish with a lower goal differential per match given the variation we've seen in the first 16 seasons.  Manchester City's percentile is 98.5, while the 2008/09 Chelsea performance was in the 94.9 percentile.  That is to say that Manchester City's goal differential performance is better than 98.5% of the expected first place performances at the 13 match mark over time, and is 4.5% higher/better than the 2008/09 Chelsea start to the season.

A similar story can be told when it comes to PPM.  Manchester City has started the season with a 2.69 PPM average, narrowly besting the 2.62 PPM start realized by the 2005/06 Manchester United and 2006/07 Chelsea squads.  Taking into account historical average (2.37 PPM) and standard deviation (0.178 PPM) data, the corresponding Z-scores for City and Chelsea/United are 1.80 and 1.39, respectively.  This puts City's percentile at 96.4 and Chelsea/United's percentile at 91.8.  Manchester City's PPM performance is better than 96% of the expected performances by first place teams at the 13 match mark over time, and is 4% higher/better than the next closest performances on record.

Projecting City's Pace through the End of the Season

But will Manchester City be able to maintain this level of performance throughout the season?  History suggests not.

Along with capturing's data at the 13 match mark, similar data was captured for each season's final table.  That final table data was then compared to the match day 13 data to examine what happens to PPM and GD per match pace by the end of the season.  The results are the regression analyses captured in the graphs below (click on either graph to enlarge).  

The solid lines represent the nominal regression equation based upon the black points from the previous sixteen seasons, while the dashed lines represent the bounds of the 50th percentile prediction intervals (PIs) given the variation seen in the regression model.  Those bounds represent the range of the middle 50% of the data, while another way to interpret the lower bound is to state that a club only has a 25% chance of finishing with a PPM or GD per match total lower than that line given the same statistic's value after 13 matches.  Manchester City's 2011/12 pace is represented by the light blue dot on each graph, with its y-value set to the projected season end value based upon the nominal regression line.  Despite the lower R-squared values both data sets met the statistical tests for linear correlation, and passed the statistical tests for regression fits and residuals.

While the slope term of the PPM regression is slightly greater than one (1.0028), the intercept term (-0.3086) is larger than the resultant product of any match day 13 PPM and the slope term.  This suggests there is a slight fall off to be expected by season's end when it comes to point accumulation.  Based upon City's current PPM, the nominal prediction is a finish of 2.39 PPM and a 50% PI of 2.22 to 2.58 PPM.  Manchester City has only a 25% chance of finishing with a PPM lower than 2.22, which is certainly championship material as it would put them right in the middle for PPM of clubs who have won a Premier League championship.

Far less sustainable is City's start in relation to GD per match.  The slope term in the GD per match regression is much less than one (0.689) while the intercept term is only slightly positive (0.0598).  This means there is a much bigger fall of in GD per match than in PPM.  Based upon City's current GD per match, the nominal prediction is a finish of 1.70 GD per match and a 50% PI of 1.44 to 1.99.  Manchester City has only a 25% chance of finishing with a GD per match lower than 1.44 - only four EPL champions have finished with a higher GD per match.

Projecting City's Odds of Finishing As Champions

So where does this put City's odds of winning the EPL Championship?  Certainly more sophisticated models could look at strength of remaining schedule, odds of injury, distractions from other competitions, and many other attributes that contribute to a full season of results.  Ironically, a far simpler model that looks at City's PPM can help quantify their odds of finishing at the top of the table.

Using the data, a binary logistic regression (BLR) model was created.  The outcome used in the model was "top of table" vs. "not top of table".  Both the constant and variable (PPM) terms tested statistically significant.  The plot below shows the nominal and 50% PI lines demonstrating the odds of winning the Premier League when the first place team at match day 13 has a range of PPM values.  Again, Manchester City is represented by a light blue dot with it's y-value set to the projected odds based upon the nominal regression equation.  Click on the graph to enlarge.

The BLR model's nominal projection is a 94% chance of City winning the EPL Championship.  The 50% PI bounds are 85% and 98%, respectively.  Manchester City's odds of winning the EPL in 2011/12 are very good based upon their start to the season.


While Manchester City's performance will certainly drop off as the season progresses, there is no doubting how good of a start they've had so far.  They're not only off to the best start in the history of the Premier League, they're also off to a start that would be better than 96% to 98% of the all the teams that would be in first place after 13 matches.  They're also highly likely to win the league.  That being said, Manchester United is only 5 points behind and Tottenham is only 7 points back with game in hand.  Things are far from over, and I will check back in at the 2/3's point in the season (26 matches) to see if City has kept up their fast start.

Saturday, November 26, 2011

A Projection of the Points Needed to Qualify for 2012 MLS Playoffs

Another MLS season goes by, another MLS season schedule and playoff format is published...

The league released a new schedule and playoff format for the 2012 season during the run up to this year's MLS Cup.  The 2012 season will see the return of the unbalanced schedule, while the playoff format will see an expansion of the home-and-away aggregate goals format into the conference finals.  First, a few reactions to the new schedule and playoff format before I get into the impacts I see in qualifying for the latter.

The Changes in Format

MLS has gone to the long anticipated unbalanced schedule for 2012, and made the following statement explaining why:

“We have established a fair and compelling format for the 2012 season,” MLS executive vice president Nelson Rodriguez said in a statement. “This regular season will include more games between regional rivals and less total travel than we have seen in recent years. Because of the wide geographic distribution of MLS clubs, this structure should improve the quality of play, while continuing to give every club an equal chance of qualifying for the MLS Cup Playoffs.”
I have long doubted the "travel = exhaustion" excuse in sports.  It always seems intuitive, since jet lag affects most of us when we travel and we're not running for miles during each stop of our journey.  It even makes more sense when MLS is compared to their European counterparts who all travel within countries the size of California or smaller when they play in their domestic leagues.  However, when it comes to wins and losses the theory just doesn't pan out.  Scorecasting did a great job of demonstrating that the vast majority of home wins can be explained by referee bias, and not fatigue due to team travel.  Perhaps MLS would make the argument that the quality of play of both teams is poor due to so much travel, and that they're more concerned about the quality of the product on the pitch and less concerned about the home pitch advantage.  Fair enough, although I don't think travel is the lowest hanging fruit when it comes to improving match quality.  Rapid league expansion that has forced a dilution in the talent pool available to MLS clubs combined with continuing desires by players to join more prestigious leagues in Europe are likely the bigger culprit.  Perhaps the league should take a breather from expansion, and let the league format stabilize for a while if they're so concerned about the quality of play.

This isn't to say that the league doesn't have a justifiable reason for taking this action, but let's just admit what it is about - money and setting up its future for more than 20 teams.  Keeping cross country travel down means more savings for the owners, as does not expanding the regular season to the 36 matches required for a balanced schedule (although not adding matches may be hurting the national team).  This is key in a league still trying to establish itself financially.  Finally, if MLS really does desire to have more than 20 teams in the league and will avoid promotion and relegation like every other US professional sports league, there is simply no way to keep a balanced schedule when they reach that point.

So what is the impact of the regular season format?  The biggest impact may be felt at the "regional rivalry" level that MLS seemed so concerned about.  MLS strangely left Houston in the Eastern Conference, which means they will only face FC Dallas once per year and will be alternating the venue from year to year.  Canadian clubs in the East (Montreal and Toronto) will only face Vancouver once as well.  Yes, this isn't exactly local, but from what I am reading on line this presents a bit of a challenge for soccer fans in the north looking to better establish a vibrant, professional Canadian soccer community.  For teams in the West who do remain in the Western conference, this arrangement presents a bit of a challenge for their local cup competitions.  Given that Western Conference teams will play each other three times a season, one team will inherently have a home pitch advantage.  That's not a very good format for a regional cup that relies on a balance in venues to minimize the effects of home pitch advantage.  The new format will increase the number of local matches, but I don't know if the format will help the rivalries in the Western Conference.

I also suspect the Conference formats that rely upon three matches between each team will end up producing a bigger disparity in points distribution.  This is due to the higher likelihood of better clubs taking advantage of an increased number of games against weaker clubs.  Playing more games repeatedly against the same competition is a recipe for achieving the expected result over time rather than the upsets that can come with fewer matches. This will likely lead to increased disparity in points earned between the better and worse clubs.  One also hopes that the league looks at the prior seasons' results to ensure teams from each conference get a balance of strong and weak opponents when playing in the interconference matches. A few years from now, when the 20th team is admitted to MLS, the disparity in the number of interconference matches played by each conference should disappear.

The changes to the playoff format have also provided a few new wrinkles.  MLS has finally recognized that if they're going to go to the trouble of having a playoff, they need to provide some type of reward to the Supporter's Shield winners.  Starting in 2012, the Supporter's Shield will host MLS Cup if they're able to make it that far.

The other major change - the expansion of the two-legged aggregate goal format to the Conference Finals - is not a good one.  I've written plenty here and here about the penalty better teams who are involved in more competitions pay in such a format.  Such a format makes sense for competitions such as Champions League where it may be the first time that season the clubs have met.  It makes no sense as a playoff format that now can only include teams within a conference who have played each other three times already that season.  The continued use and expansion of such a playoff format within MLS suggests to me that the league can't decide if it wants to cater to fans of European soccer or American professional sports.  My contention is that if MLS is going to insist on a playoff format to determine its champion like other US sports leagues, it should use a single elimination format that rewards teams at the top of the table.

The Impact on Playoff Qualification

Regardless of the criticism, the playoff format is what it will be next season.  With the changes that have been made, how are the odds of qualification impacted?

The challenge in analyzing MLS historical performance is the non-constant number of clubs in the league, the maximum number of points available, and the number of teams that qualify for the playoffs.  Luckily, there is a method for translating historical table position data into a format that allows for comparison across seasons.  Data from 2005 through 2011 was compiled for this analysis, with table position translated via the equation below and points earned converted to the percentage of total points available each season.
-LN[table position/(number of teams + 1 - table position)]
The transformed data was then plotted, and regression lines created along with the 50% prediction intervals (PI) associated with the data.  Given the new playoff format and the lack of teams crossing over from one conference to the other for the playoffs, historical data was separated by conference.  Plots for the Eastern and Western Conference data are shown below (click on graphs to enlarge them).

The fit of the regressions is relatively good - the Eastern Conference has an R-squared value of 0.80 and the Western Conference a value of 0.76.  The main difference in those fit values can be attributed to the awful performances by Real Salt Lake and Chivas in 2005 where they earned 21% and 19% of the available points, respectively.  Those data points are represented by the blue dots in the lower right hand end of the graph, just above the upper 50% PI line.  If those two data points were removed, the R-squared value for the Western Conference would improve to 0.81.

An interesting comparison can be made between the conferences by examining the slope terms of the two regression lines (eg the number before the "x" variable).  A higher slope term indicates that teams can achieve a particular table position with a lower percentage of points, while a lower slope term indicates that teams must earn a greater percentage of points to earn the same table position.  To make the comparison easier, a plot of both conferences and their regression lines are shown below, along with a third plot of the regression line that would be associated with a single table of both Western and Eastern Conferences combined.

The graph demonstrates the difference between the two conferences.  Looking at the right-hand side of the graph, once can see the separation between the upper table positions of the two conferences.  The Western Conference line is lower than the Easter Conference line at those upper table positions, reflecting the difference in slope terms (10.0 versus 12.2, respectively).  Based purely on the nominal regression line, one could argue that finishing at the top of the Western Conference is more difficult than accomplishing the same in the East.  

Outcomes are never as deterministic as a single regression line suggests.  Taking in to account the variation within the regression analysis, the reality is a bit murkier.  The 50% PI lines quantify the expected outcome for the middle 50% of possible outcomes over time, which provides a way to quantify the expected variation over time.  A similar interpretation is that 25% of the teams with a specific point total will finish above the upper dotted line, and 25% of the teams will finish below the lower dotted line.  The regression study can also be flipped - points vs. table position - to produce 50% PI's for the point percentages required to finish in specific table positions.  It is this relationship that is explored further.

Fifty percent of the teams finishing in the top position of a conference will earn between 56.9% (58 pts/1.71 PPM) and 62.8% (64 pts/1.88 PPM) of the available points in the East, while Western Conference teams will earn between 58.7% (60 pts/1.76 PPM) and 66.4% (68 pts/2.00 PPM) of the available points.  It's certainly not a a statistically significant difference, but it is indicative of the mildly more difficult path to the top of the Western Conference.

Looking at the numbers close to a y-axis value of 0 helps identify the points required to simply qualify for the playoffs.  The regression lines begin to overlap by the time they approach this y-axis value.  Coincidentally, the single table regression line also overlaps the two conference lines at this point, suggesting that there is no real change in qualification odds via the league's move from using the overall 10th place finisher versus using the 5th place finisher from each conference.  Utilizing the 50% PI data, the middle half of fifth place finishers in the East will earn between 43.0% (44 pts/1.30 PPM) and 48.7% (50 pts/1.47 PPM) of the available points, while those in the West will earn 42.3% (43 pts/1.26 PPM) and 49.7% (51 pts/1.5 PPM).  The slightly wider variation seen in the Western Conference can be directly attributed to the slightly worse fit discussed earlier.

Interestingly enough, the data suggests it is completely reasonable to expect (via the 50% PI data) that teams with less than 50% of their available points could finish as high as the third playoff seed.  Qualifying third would give the team a bye into the first proper round of the playoffs, avoiding the one game playoff required of the fourth and fifth seeds.  If MLS got rid of qualifying the fifth team and simply qualified the top four for the playoffs, the 50% PI for percentage of available points in the East would be 45.5% (46 pts/1.35 PPM) to 51.2% (52 pts/1.53 PPM) in the West it would be 45.3% (46 pts/1.35 PPM) to 52.7% (54 pts/1.59 PPM).  This would represent an increase of nearly 3% of available points, or an increase of nearly 3 points, to qualify for the MLS playoffs.


There is a good bit of historical precedent for teams earning less than 50% of the available points and making the MLS playoffs.  Such an outcome is inherent to a league that qualifies more than 50% of the teams for such a playoff.  Historically, 14 of the 28 teams that have qualified for the playoffs from the East have earned fewer than 50% of the available points, while 10 of the 30 teams that have qualified from the East have earned fewer than 50% of the available points.  In both cases, the majority of these teams qualified in the 4th or 5th table positions.  In the East, only four such teams have qualified as a third seed and only one in the second table position.  In the West, four of the teams qualified as a third seed and two qualified in the second position.  Such generous odds of playoff qualification shall continue until MLS admits a lower percentage of playoff participants via an increase in the number of teams in the league or reduces the number of playoff spots available.

Monday, November 14, 2011

A review of Simon Kuper's "Soccer Men"

It's no secret that my foray into the realm of soccer statistics blogging can be directly attributed to my initial reading of Soccernomics. I had fallen in love with soccer during the Sounders' inaugural season, had picked my obligatory overseas team of higher quality later that same year (Arsenal), and was bound-and-determined to get some new reading material in early 2010 in preparation for that summer's World Cup. I stumbled upon Kuper's book, which combined a professional and personal passion of mine (statistics) and my new love for the sport. The rest is now history.

Since then I have read his first book about the sport, Soccer Against the Enemy, as well as Ajax, The Dutch, The War. Each of the three books looks at the sport through a different topic - statistics, conflict, and World War II. Kuper's latest book, Soccer Men, doesn't have such an overt theme as a method for examining the game. To be honest, much of the book's content has been published elsewhere. The book serves as a compilation of profiles and interviews Kuper has written over his 15 years of covering the sport. Nonetheless, in serves a very useful compendium for both those who have followed Kuper for all 15 years or those who are more recent converts like me. While the theme that binds all of these short chapters together is not as overt as in works past, there is one there. It is the idea of the professionalism of the men who play, manage, and make decisions within the sport.

Almost in a nod to the fact that much of this book is not original content, the book's introduction explains that the concept was not Kuper's idea nor was its name. Forty-three years earlier a British journalist named Arthur Hopcraft had published a book titled The Football Man. It was a book covering a different type of game, player, and manager, but the concept was the same as Kuper's. It was a time, as Kuper reminds the reader, when the phrase "soccer literature" might have been viewed as an oxymoron. As only a writer foundational to modern soccer literature could write, Kuper notes of Hopcraft's writing:
"He took [his subjects] seriously, not as demigods but as ordinary men and craftsman. His overly polished prose is a bit dated, and we no longer need his assurances that soccer is important enough to write about (quite the opposite: We now often need to be told that it isn't.)"
Few authors now writing about soccer have their words taken more seriously and studied so studiously as Kuper. Yet, in almost in the manner of Heath Ledger playing the Joker, Kuper asks "WHY... SOOOOO... SERIOUS!?!?" One can almost hear the wry smile cross Kuper's mouth as he was writing this entry in the book.

Then again, it's all in the subtlety of what kind of seriousness in soccer commentary Kuper is criticising.

As is all too well documented, the last few decades have seen a seriousness forced onto the sport and its players with the rise of its international business.  The Deloitte Money League's top twenty clubs brought in 4.3 billion ($5.9 billion) in revenue during the 2009/10 season.  A microcosm of the rise of the international game is the Premier League, which claimed 7 of the 20 spots on Deloitte's list.  The EPL has seen a more than 8 fold increase in the average player transfer fee, and a similar if less-spectacular rise in wages.  The clubs, the players, and the management are all treating the game like a business, and a serious one at that given the large sums of money involved.

Kuper's point at the outset of the book is re-iterated in profile after profile contained within the book's covers.  It is a recognition of the fact that everyone  treats the game seriously because their very large paychecks depend on it.  Any love affair with a badge that causes a player to kiss it lasts only as long as the next outsized transfer or contract offer that rolls through their door.  Profiles of Wayne Rooney, Steven Gerrard, Frank Lampard, and others who had lifelong childhood attachments to clubs that they ultimately turned their backs on due to better offers demonstrates the rationality of following the money.  The expectation that love for one's club, who is an employer, should outweigh one's love for their own material self interest is something only a supporter can foist upon a player.  After all, as Kuper has repeatedly asked in interviews supporting this book, "Do you love your employer so much that you would turn down a much better compensation package from another employer?  Do you love your bank so much that you wouldn't leave it if offered a much better interest rate elsewhere?"  The answer clearly is, "Of course not!".  Yet we take our love for the game and club so seriously that we scream and yell at any player who has the temerity to judge the grass is greener on the other side and look for rewards - trophies, money, playing time - elsewhere.  It's the seriousness of soccer talk, which treats the most minuscule of events as do-or-die, and simultaneously denies the seriousness of the business reality of the sport that Kuper is criticizing (curious readers can read Kuper's expansion on this concept in an interview Sarah Rudd and I conducted with him in October).

Kuper's explanation of player-as-professional is perhaps no better displayed than in his five part series on England's golden generation - profiles of Jamie Carragher, Ashley Cole, Steven Gerrard, and Wayne Rooney built upon their mediocre-to-awful autobiographies.  Here the reader gets a sense of the cocoon modern soccer players live within from age 12, and how they become ruthless businessmen  who believe wholeheartedly in their abilities and the right to get paid for them.  Reading this chapter helps one better understand the somewhat foolish vapidness of the interview with Nikolas Anelka elsewhere in the book, or why Kuper has repeatedly commented that he believes Messi is one of the greatest players to write about and likely the worst to interview.  The pressure to be company men, to never say anything of consequence for fear of punishment, makes current players dull interviews.

It's that reality that makes ex-players the better interview, and perhaps my favorite part of the book was the lengthy recounting of an evening with Johnny Rep and Bernd Holzenbein in June 2004.  Two men who played central roles in the 1974 World Cup final between Germany and Holland, and thus central roles in the Dutch attempt to re-fight World War II, couldn't have been more relaxed and entertaining.  The two former players almost seem bemused by the importance forced on their match by the Dutch, and don't talk much of it except when forced to at a panel on the exact topic.  Otherwise, Holzenbein wold rather talk about the Final in 1954 that the Germans also won, as he saw it as more formative to the nation's postwar sporting experience.  Overall, we learn far more about these two men, their attitudes towards the game, and what's truly important to the players who create the sport rather than the fans who consume it.  Kuper concludes the chapter with,
The history of soccer would read very differently if it were written by actual players, They would never organize a debate about a long-gone World Cup final, or if they did, it would focus on the postmatch banquet to which the wives weren't invited...
Indeed - things far less serious than the latest transfer rumor, or who did or didn't dive to earn a penalty. It would focus on things to which spectators have no relationship, because it never appeared on a television.

There is a good bit of fun in reading what Kuper thought several years ago, and whether or not his predictions panned out.  Such predictions have produced a mixed bag of results.  He predicted Drogba's relationship with Mourinho would lead him to leave Chelsea and follow Morinho to Inter - clearly, Drogba did not (although Kuper did take that as a sign that Drogba simply followed rational self interest over friendship - a common theme in the book).  He wrote in 2005 about Michael Essien being the harbinger of a future filled with physically big players - a theory that small midfielders for Spanish club and national teams would end up making a false prophecy.  Correctly foreshadowing Ruud Gullit's decade of management failure, Kuper provides a great profile of Gullit playing for Ajax's fifth string squad in the waning days of his playing career.  The book is chock full of period writing that gives us a better understanding of how players and the game looked then, uncolored through the eyes of history.

While the book spends much of its ink on players, there are two other parts of it that cover managers and "other" soccer men.  I am not a huge fan of the managerial section.  It's hard to take a profile of Mourinho seriously now that we've seen that his rabid paranoia and hatred for Barcelona can devolve into eye gouging on the pitch.  Kuper has a series of articles on Glen Hoddle, Sven-Goran Erikson, and Fabio Capello to tell the story of the thankless job of being England's manager.  There are two good articles on Arsene Wenger in this section of the book - one in May 2003 and another in April 2010.  Kuper makes the point clear that what was game changing for Arsenal in 2003 was simply average by 2010, and once Wenger's difference makers had been adopted by others in the league there was no way for the Frenchman to compete with clubs that consistently offered more in transfer fees and wages.

The far more fascinating section to me was the final part of the book that compiled articles on the "other" soccer men in the game.  The profile of Jacques Herzog, a Swiss architect responsible for many of the modern soccer stadiums seen today, is a very interesting one.  Herzog, at the risk of being too serious, treats building stadiums much like the seriousness with which cardinals must have treated building a cathedral during medieval times.  Herzog draws much inspiration from English stadiums.  Kuper writes:
What soccer fans crave in a stadium is communal emotion... "It's somehow an attempt to go back to the roots of soccer," says Herzog, "to take some of those archaic ingredients.  The Shakespearean theatre, probably it was even a model for the soccer stadium in England - this closeness between the actors and the crowd.  If you can achieve this proximity, the people become the architecture."
Herzog's attempt to build such a feeling at Allianz Stadium is captured exquisitely in Kuper's profile.

Kuper concludes the book with a profile of Ignacio Palacios-Huerta, an economics professor who has assembled the most impressive database of penalty kicks taken since 1995.  The profile was published in the middle of the 2010 World Cup.  Palacios-Huerta explains how his database can be used, and laments the number of errors clubs and national teams make that could be simply corrected by a basic study of the numbers.  Ironically, Kuper had seen Palacios-Huerta's information related to the Dutch and Spanish teams for the World Cup Final and had even put the information in the hands of a Dutch coach.  Kuper was literally minutes away from his and Palacios-Huerta's information being the key to a Dutch win on penalties when the 10-man side finally succumbed to the Spanish attacks.  It's too bad that they did, otherwise this part of the book would have provided for an epic book in and of itself.

Overall the book is another outstanding read from one of the foundational authors in modern soccer literature.  Kuper may ask each of us to take the game a little less seriously, and indeed we all should.  Fewer chattering heads and bleating online that treats the sport less like a game of escape would make us all a little more tolerable to be around.  However, our bookshelves and literary lives are far more complete due Kuper ignoring his own advice and taking such a serious literary approach to the beautiful game and the men within it.

Saturday, November 5, 2011

Reactions to MLS Semifinals, Conference Final Odds, and an Update on Semifinal Model

It Still Hurts...

Conference Semifinal Reactions

MLS's annual bastardization of soccer playoffs - aka the conference semifinals - is now complete.  Sure, I'm a little bitter because my team dug a hole in its first leg that it couldn't climb out of even with an outstanding performance.  I was at that second leg this past Wednesday, and the energy was electric until the final whistle.  It's more that this league can't seem to figure out what it really wants to be - it wants to cater to the American sports fan via a playoff format, but then in a nod to every other knockout format by utilizing two-legged semifinals while not even implementing the away-goal rule.  MLS would be better off picking one direction or the other and sticking to it.

Nonetheless, the Sounders and three other teams are out of the playoffs now, and we're down to the final four teams fighting for a spot in MLS Cup 2011 in LA.  The format is what it is, so it's time to see how I did against it. I went 2-for-4 in my conference semifinal picks, with varying reasons for success and failure.

I got the LA and Kansas City wins correct.  In LA, I correctly bet they were too good to go down due to the six match goal differential they had to the Red Bulls.  In Kansas City, I correctly bet they would hold serve on match differential and were simply too hot to not win.  Clearly, their 4-0 drubbing of Colorado over two matches demonstrated that superior form.

Honestly, the Philadelphia/Houston series was a toss up from a statistical prediction standpoint.  It was the closest of the four using my statistical methods, but any statistical advantage for Philly came in that their coach had less experience than Houston's (they were even on matches played).  Luckily, this year's results got rid of that silly "coach experience" anomaly as a statistically significant predictor (more on the adjustments to the model later).  The matchup was really just a flip of a coin statistically, and perhaps I should have gone with the experience of Houston over the second-year improvement and first playoff birth for the Union.

In the Seattle/Real Salt Lake series I picked against my statistical judgement, giving in to supporter's optimism.  In the closing weeks of the regular season I told any Sounders supporter I knew that I would rather the Sounders have faced FC Dallas in the first round than Real Salt Lake.  RSL's skid at the end of the season was a false one - one predicated upon missing personnel they were getting back by playoff time.  FC Dallas, on the other hand, was clearly a slumping team that continued to slump in the playoffs.  The Sounders would have matched up far better against FC Dallas, would have likely been playing to finally get the LA monkey off their back in the Conference Final, and Real Salt Lake would have been tearing up the Eastern Conference Playoffs and be in that conference's final right now.  They'd likely have won the East, and we'd be staring at an RSL vs. Sounders/Galaxy final in several weeks.  For all the griping that would have come from a "Western Conference team winning the East", it would have been a just end to a season that saw those three teams dominate the Western Conference and largely the entire league.  Ironically, one of the few just endings from the MLS playoffs in recent memory.

Rarely do things work out as desired, and Seattle faced RSL in the conference semifinals.  As a supporter, I picked against the statistics, the Sounders' history of troubles in the playoffs (they had to end sometime, right?), and Real Salt Lake's playoff experience.  I felt the Sounders and Galaxy would both overcome the statistics, and perhaps we'd be able to say the league had gotten to the point that its playoff format didn't determine champions based upon who had played fewer matches in a season.  Watching the first leg from the couch of my living room, I immediately regretted the pick (side note: luckily an 8-hour exam earlier in the day and three beers throughout the match luckily made me too tired to throw anything at the television, or else I'd be out a couple grand right now due to buying a new television).  The Sounders picked the worst day of the year to play what was their worst game of the year, resulting in a 3-0 deficit for them.

The return leg was the polar opposite.  It was very clear that RSL was intent on parking the bus and earning a berth in the Western Conference Finals based purely upon the three goals they scored in the first leg.  The statistics in the table below, which compares the change in different statistics from games one to two for each of the clubs leading after the first leg in the 2011 conference semifinals, bear this out.

Granted, the other three teams were heading home to defend their leads, none of them was as large as Real Salt Lake's, and none of their first leg performances had been as dominant as Real Salt Lake's.  RSL said all the right things going in to the second leg in Seattle, recognizing the Sounders were a dangerous team  - they had won eight matches during the regular season by scoring three or more goals, six of those wins were by two or more goals, and two of them were 3-0 shutouts.  Still, watching the game live, re-watching highlights, and then looking at the statistics above I can't help but feel RSL went beyond parking the bus.  Time wasting got so bad that Nick Rimando was issued a yellow card for just such an infraction.  RSL simply hunkered down and was content to boot the ball forward.  The starkest contrast could be drawn with Sporting Kansas City, who went home up 2-0 and came out with attacks in the second leg that netted another 2-0 result for them.  RSL was the only team of the four to move on to the second leg and have a worse performance across the board.

Nonetheless, the Sounders fell short of their attempt to come back from a three goal deficit.  What will likely haunt them the entire offseason is not the misses or blocks in the second leg - there's not much they can do about a Real Salt Lake defense that played relatively well against the 26 shots they faced.  It will be the Grabavoy goal in the dying minutes of the first leg that ended up giving RSL their three goal lead going back to Seattle.

None of this is to say that RSL doesn't deserve the win.  They played outstanding, attacking football in the first leg, and combined with the Sounders horrible performance they earned their three goal lead.  The shame is that they didn't pursue the single goal in Seattle that would clearly put them through to the final, and instead played cynical, time wasting, park-the-bus soccer that helps fuel criticism of MLS's two-legged format.

Update to the Conference Semifinal Models

With the conclusion of this year's conference semifinals, eight new data points were added to the model that is based upon MLS playoff data from 2003 forward.  Those new data points have helped to make the model a little more logical, as well as confirm one of the early trends.

On the logic front, losses by Philadelphia and New York, who had some of the shortest tenured managers in the playoffs, eliminated the odd historical anomaly of less experienced managers fairing better in the conference semifinals from the ranks of statistically significant predictors.  Replacing it in the list of significant predictors was the difference in the teams' seeds.  A plot of the effects of seed difference are shown in the graph below.  Seeds are listed numerically, so top seed LA (1) playing bottom Western Conference seed New York (6) would produce a seed difference of -5 for LA and +5 for New York.

Based upon the graph and its associated equation, each unit difference in seed changes the odds of winning a two-legged playoff by 6.8%.

Despite the LA Galaxy becoming the first team to win a two-legged conference semifinal when facing a team that had played 6+ fewer games than them, the trend of teams playing more games losing their two-legged playoff continued.  Two of the teams that lost - Seattle and Colorado - each played four and three games more, respectively, than their opponents.  The net impact of the 2011 results is expressed via the graph below.

Astute readers who compare the exponent term in the equation to the same term from the 2003-2010 data will see that it is numerically smaller.  The net effect is to lower the impact of the difference in matches being played: a 6.9% change in odds of winning the series for each unit change in game differential compared to a 7.5% change excluding the 2011 playoff data.  The addition of the extra data points also tightens up the 95th percentile bounds.  Data through 2010 indicated a 95th percentile range of .34 around the nominal (solid) line between game differences of -5 and +5.  The increased sample size and results from the 2011 data have now tightened this range to 0.29.  In statistical speak, the accuracy of the model's nominal prediction continues to increase, while the effect of increased matches seems to be a bit lower than originally predicted.

A Brief Prediction of the Conference Finals

Going in to the conference finals, the playoffs switch back to a single match, winner-take-all format at the higher seed's home pitch.  As was shown in my earlier post on the history of MLS single-match playoffs since 2003, the only statistically significant predictor of success is the difference in the team's two goal differentials throughout the season (including playoffs).  The table below provides a comparison of the conference finalists' goal differentials and their odds of winning.

I'll be sticking with the numbers.  In the case of Kansas City, I think they're simply too hot to lose this match at home.  A rough start to the season on the road has been rewarded with a second half of season homestand and outstanding play to go with it.  I agree with Grant Wahl when it comes to LA - their season may go down as the single greatest in MLS history if they're able to to win the MLS Cup.  The match with RSL will be close, but in the end I think they will prevail.  I just think LA is too good to not win at home in the conference finals, and then win again at home two weeks later to hoist MLS Cup 2011.