I've been writing for nearly a year about how MLS teams that play in a greater number of competitive matches outside of league play end up paying a penalty when it comes to MLS's playoff format. It's a penalty that is unique to the conference semifinals that use a two-legged aggregate goal format, whereas the conference finals that were single elimination through last year do not see such a penalty for teams who play a greater number of extra-MLS matches. Several readers of those posts had asked that a more refined metric be studied - one that actually quantified the number of minutes played by each playoff team or the individual players within it. While more games can certainly be understood as more taxing on teams come playoff time, there is a difference in how clubs can approach such extra demands. Improved player rotation in non-MLS competitions can minimize the wear-and-tear on the preferred starting XI that are utilized in the playoffs. The key to such an analysis is finding a comprehensive database of MLS playing time statistics.
Just such a database exists via a publicly available resource. Climbing the Ladder's MLS Player Lineup Database has maintained a record of every match's scoreline, in which competition it occurred, the players who participated in each match, time of any substitutions, as well as goal scorers and those who assisted in the goals. It's a gold mine for MLS statheads, but it does require some post processing to make full use of the data that is contained in text strings within its Excel files.
Elijah at the Climbing the Ladder blog was gracious enough to provide me with a pre-release of the 2011 database so that all seasons from 2003 forward could be included in my analysis. I've partnered with Sarah Rudd at OnFooty.com to put the data into a more database-query friendly format as well as clean up a few misspelled names. Sarah's also written about the concept of playing time management and squad rotation, so it was only natural that she and I partner to explore the concept further via the CTL database.
Our post processing allows us to look at player minutes over an entire season with all extra competitions included (US Open Cup, SuperLiga, CONCACAF Champions Cup/League, etc.). This post will utilize the data on minutes played prior to the conference semifinal round to see if any additional predictors of playoff success can be realized. Subsequent posts will look at similar effects on later playoff rounds.
Methodology and Model
Like previous posts, a binary logistic regression (BLR) model was used to estimate the likelihood of a team winning in the conference semifinal round of the playoffs. The factors included in the study were:
- Season Series PPM Differential
- Seed Difference
- Season Series Goal Differential
- Difference in Coaching Experience
- Total Number of Games Played Differential
- Final 5 Games Point Differential
- Difference in Season Goal Differential
- Median Minutes Played Differential
- Interquartile Range (IQR) Minutes Played Differential
This doesn't necessarily mean the model is co-linear and therefore of little value. In fact, the effect may be quite the opposite. It's quite conceivable that the greater spread of minutes (higher IQR) may lead to higher odds of playoff success, while it's already understood that a larger number of games is detrimental to those odds. What matters is testing the outcome of the model for multi-colinearity. Such tests showed multi-colinearity did not exist. The mild linear relationship between the minutes played statistics and the number of games played is an item of interest that will be explored further.
There are also two ways to look at player data that would produce very different estimates for the median and IQR values. One method utilizes data from all players who played at least one minute at a club over all competitions within a season to measure the overall effect of the matches on player time distribution. Another method utilizes only the data for players who appear in the semifinal round of the playoffs - a reduced data set from the first method that looks at the impact to only those players who featured in the playoff round. As the desire is to have a predictive model for future years of playoffs, the first method was used as it does not rely on a projection of starting lineups to make a prediction. It also accounts for things like mid-season trades, injuries, and player churn that may greatly affect the time a player spends on the pitch for a specific club prior to the start of the playoffs. This is key to understanding how much playing time a player may have had to work with the rest of his teammates.
In re-running the BLR model for all of the factors listed above, only the following three were found to be statistically significant (p-value > 0.05):
- Games Played Difference
- Median Minutes Played Differential
- IQR Minutes Played Differential
Now that the relative impact of different factors on the change in odds of winning a conference semifinal are understood, what is the impact on the actual odds of winning the conference semifinal? For such an analysis the complete BLR model, which simultaneously evaluates the impact of all three variables, must be evaluated.
The Impact of Club Median and IQR Minutes Differential
The BLR model is four-dimensional, which makes visualizing its behavior a bit difficult. A simpler way to view the data is to hold all other variables constant at a value of zero and then plot the changing odds as a function of the variable of interest. Such a plot is made below for the median and IQR minutes differential factors.
Whatever the reason, the model indicates there is a sweet spot for squad rotation. Players need to be rotated through to encourage a distribution of work and raise the IQR, but a core group of players getting a greater number of minutes bodes well for team success in the playoffs.
A Comprehensive Model For the Conference Semifinals
While visualization is easier in two dimensions, the reality is that the BLR model for conference semifinal odds is actually expressed via four dimensions. Given that the coefficients for the median and IQR minutes played factors are relatively similar, a single three dimensional graph of the model's behavior against the game differential factor and median minutes played factor can express how odds change with the combination of factors. Such a graph is shown below.
As the game differential factor approaches zero, the effect of player minute differential on the odds of winning a conference semifinal become more pronounced. In fact, at a game differential of zero the odds of winning the conference semifinal collapse to the two dimensional graphs shown earlier in this post.
The result is that between 0 and either +15 or -15, the odds of winning an MLS conference semifinal matchup take on what may be best described as a "twisted S curve" shape. Again, the level of compounding of the twist in the S-curve is dictated by how far each factor's value is away from zero.
The BLR model for MLS conference semifinal odds has been updated to account for factors that take into account squad rotation, long term player injury, and squad churn. It now measures both overall team fatigue (number of matches) and squad management (player minutes). Such a comprehensive model allows for an evaluation of how well teams have done from 2003 onward given the fatigue and rotation with which they started their playoff run. Additionally, the single elimination BLR (2011 wild card + 2003-2011 conference finals) should be re-examined in light of the player minutes data now available. That topic will be covered in a future post that continues to focus on the factors that contribute to MLS playoff success.