The 2011 MLS Cup playoffs start tonight, and I thought it might be worth taking a statistical look at what it might take to get through the new Wild Card playoff round.
For those who don't follow MLS regularly, a bit of background is in order. Unlike other soccer leagues, MLS does not use a top-of-the-table format to determine their champion. In a nod to other US sports, they use a playoff to determine the champion, and yet in a nod to the global soccer community they don't use a single elimination model for the entire playoffs. Since 2003 they've used a hybrid that has a two-legged tie in the first round, then a single elimination match at the higher seed for the second round, and then a championship match at a neutral site for the final.
This year they've added a single elimination wild card round for the bottom four teams in the playoff seeding, and the winners of those two matches will then move on with the top six seeds in the playoffs into the aforementioned proper first round two legged tie. The playoffs then proceed as in years past - single elimination second round, then a neutral site winner-take-all championship. If you're confused as to why the American professional version of the sport takes such a convoluted way to crown a champion, you're not alone. I've already written plenty on how the playoff system needs to be changed, but it is what it is so we might as well analyze it.
The focus of this post will be on the factors that impact single elimination games given that the Wild Card matches are first up. I'll return later this week with a post looking at the two-legged tie odds that come out of the Wild Card matches.
Like my previous posts on the MLS playoffs, I utilized a binary logistic regression model to determine which factors were statistically significant predictors of MLS single match playoff success. Just like the previous series of posts, I only used data from the 2003 season forwards (data used within this analysis came from Climbing The Ladder's excellent player lineup database). This presents a bit of a challenge in producing enough samples to identify statistically significant factors - each year of playoffs only had two matches of single elimination games, compared with 4 two match series in the first round of each year's playoffs. To put things in perspective, the fact that we will have two rounds of single elimination matches this year (Wild Card and proper second round), the total number of single match playoffs will actually increase by 25% (20 samples at year end vs. 16 at start of 2011 playoffs). Nonetheless, the 16 match samples are what's available today, and there was one statistically significant factor that stood out from my analysis of those matches.
The list of factors considered in the analysis are shown below, in order of least significant to most significant factors. Only the last factor, the difference in the two teams' season goal differential, actually met the criteria of being statistically significant (factor had a p-value < 0.05).
- Season Series PPM Difference
- Seed Difference
- Season Series GD
- Difference in Coach Experience
- Games Played Difference
- Final 5 Games Point Difference
- Venue Point % Difference
- Difference in Season Goal Differential
A plot of the one statistically significant term, difference in season goal differential, is plotted below. The range of differences considered was constrained by the maximum difference seen in the data set, which was seen in the 2007 playoff match between the Houston Dynamo and Kansas City Wizards.
The effects of low sample size show up in the plot via the wide gap between the dashed lines, which represent the bounds of the 95th percentile prediction interval for the odds of winning a match for a given difference in goal differential. These lines will come closer together over time if current trends hold as further MLS Cup playoffs are contested. The key takeaway from the graph is that for every unit increase in the Difference in Season Goal Differential between -10 and +10, a team's odds of winning a single playoff series increases by roughly 3%.
Applying this model to the teams in the 2011 Wild Card matches yields the table below.
I wouldn't necessarily take these odds to the bank given the wide gap in the 95th percentile prediction interval on the graph. But with FC Dallas slumping as of late, it's not unreasonable to expect the Red Bulls to pull off a win. The same could be said of the Rapids, who are looking to validate last season's championship run from the lower levels of the playoff seeding with a similar run in 2011.
What the analysis does indicate is the inherent fairness of a single match playoff if a playoff format must be used to determine the MLS champion. Previous analysis of the two legged tie format showed successful clubs - those who compete deep into the season on many fronts and those with longer serving coaches - tend to be penalized more in a two legged format. That format also effectively takes away all the advantages of a higher seed's hard work throughout the season, as the lower seed also gets to enjoy a home match. Analysis of the single match format seems to indicate teams that do well throughout the season tend to do well in the single match format. If anything, MLS should restrict the two legged tie to the wild card match and reward the higher seeds with single game home matches in the proper first round against teams who've played an additional two matches via their victories in the Wild Card round. This would be a better balance of many people's desires to see consistent success over an entire regular season rewarded like it is other professional soccer leagues with the desire on other's parts for a US-style playoff.