Thursday, January 12, 2012

The Impact of Player Minutes on MLS Conference Semifinal Success

I've been writing for nearly a year about how MLS teams that play in a greater number of competitive matches outside of league play end up paying a penalty when it comes to MLS's playoff format.  It's a penalty that is unique to the conference semifinals that use a two-legged aggregate goal format, whereas the conference finals that were single elimination through last year do not see such a penalty for teams who play a greater number of extra-MLS matches.  Several readers of those posts had asked that a more refined metric be studied - one that actually quantified the number of minutes played by each playoff team or the individual players within it.  While more games can certainly be understood as more taxing on teams come playoff time, there is a difference in how clubs can approach such extra demands.  Improved player rotation in non-MLS competitions can minimize the wear-and-tear on the preferred starting XI that are utilized in the playoffs.  The key to such an analysis is finding a comprehensive database of MLS playing time statistics.

Just such a database exists via a publicly available resource.  Climbing the Ladder's MLS Player Lineup Database has maintained a record of every match's scoreline, in which competition it occurred, the players who participated in each match, time of any substitutions, as well as goal scorers and those who assisted in the goals.  It's a gold mine for MLS statheads, but it does require some post processing to make full use of the data that is contained in text strings within its Excel files.

Elijah at the Climbing the Ladder blog was gracious enough to provide me with a pre-release of the 2011 database so that all seasons from 2003 forward could be included in my analysis.  I've partnered with Sarah Rudd at OnFooty.com to put the data into a more database-query friendly format as well as clean up a few misspelled names.  Sarah's also written about the concept of playing time management and squad rotation, so it was only natural that she and I partner to explore the concept further via the CTL database.

Our post processing allows us to look at player minutes over an entire season with all extra competitions included (US Open Cup, SuperLiga, CONCACAF Champions Cup/League, etc.).  This post will utilize the data on minutes played prior to the conference semifinal round to see if any additional predictors of playoff success can be realized.  Subsequent posts will look at similar effects on later playoff rounds.

Methodology and Model

Like previous posts, a binary logistic regression (BLR) model was used to estimate the likelihood of a team winning in the conference semifinal round of the playoffs.  The factors included in the study were:

  • Season Series PPM Differential
  • Seed Difference
  • Season Series Goal Differential
  • Difference in Coaching Experience
  • Total Number of Games Played Differential
  • Final 5 Games Point Differential
  • Difference in Season Goal Differential
  • Median Minutes Played Differential
  • Interquartile Range (IQR) Minutes Played Differential
The minutes played, difference in coaching experience, and total number of games played data comes from the CTL database.  All other data comes from publicly available sources like the MLS website.  The median and IQR values for minutes played from each club are used in lieu of the mean and standard deviation as the data associated with player minutes over a season is not normally distributed.

There's always the risk of introducing multi-colinearity when adding new variables to a regression model.  This is especially the case when looking at the number of matches (or the differential in matches between two teams) and the number of minutes players accrue through the season (or the differential in minutes between two teams).  It's intuitive to think there is a correlation between the two.

It turns out the median and IQR minutes played differentials are correlated to the number of games played (statistically significant with p-values much less than 0.05), although the relationship between the two is relatively week (see R-squared values in graphs below - click graph to enlarge).


This doesn't necessarily mean the model is co-linear and therefore of little value.  In fact, the effect may be quite the opposite.  It's quite conceivable that the greater spread of minutes (higher IQR) may lead to higher odds of playoff success, while it's already understood that a larger number of games is detrimental to those odds.  What matters is testing the outcome of the model for multi-colinearity.  Such tests showed multi-colinearity did not exist.  The mild linear relationship between the minutes played statistics and the number of games played is an item of interest that will be explored further.

There are also two ways to look at player data that would produce very different estimates for the median and IQR values.  One method utilizes data from all players who played at least one minute at a club over all competitions within a season to measure the overall effect of the matches on player time distribution.  Another method utilizes only the data for players who appear in the semifinal round of the playoffs - a reduced data set from the first method that looks at the impact to only those players who featured in the playoff round.  As the desire is to have a predictive model for future years of playoffs, the first method was used as it does not rely on a projection of starting lineups to make a prediction.  It also accounts for things like mid-season trades, injuries, and player churn that may greatly affect the time a player spends on the pitch for a specific club prior to the start of the playoffs.  This is key to understanding how much playing time a player may have had to work with the rest of his teammates.

In re-running the BLR model for all of the factors listed above, only the following three were found to be statistically significant (p-value > 0.05):
  • Games Played Difference
  • Median Minutes Played Differential
  • IQR Minutes Played Differential
The results of the BLR are displayed in the table below.  The median and IQR minute differential factors have been divided by 100 to put them on a scale similar to the game difference factor and provide for a better comparison of odds ratios.


The results of the BLR model demonstrate that the game difference factor continues to have the biggest impact on conference semifinal odds.  The odds ratio communicates the percent change in odds when moving one unit of measure up or down (game difference or every 100 minutes of playing time for a player).  The odds ratio of each minutes played factor communicates that each increase in median and IQR differential units (in this case, every 100 minutes) leads to a 31% and 27% increase in the odds of winning the conference semifinal, respectively.  Conversely, a unit increase in game difference leads to a 39% decrease in odds, or a 64% increase in odds for each fewer game played.

Now that the relative impact of different factors on the change in odds of winning a conference semifinal are understood, what is the impact on the actual odds of winning the conference semifinal?  For such an analysis the complete BLR model, which simultaneously evaluates the impact of all three variables, must be evaluated.

The Impact of Club Median and IQR Minutes Differential

The BLR model is four-dimensional, which makes visualizing its behavior a bit difficult.  A simpler way to view the data is to hold all other variables constant at a value of zero and then plot the changing odds as a function of the variable of interest.  Such a plot is made below for the median and IQR minutes differential factors.


The graph puts numbers to the trends described above for the BLR model's odds ratios.  The trend of increasing odds with increasing IQR makes sense.  Increasing IQR indicates a bigger spread in the distribution of playing minutes to individual players throughout the season, which could be indicative of effective player rotation.  The trend of increasing odds with increasing median minute differential may not be as intuitive given that it might correlate with the number of games played and thus make us think players are being overused.  However, this neglects the reasons for the opposite effect of a median minutes differential deficit.  Such a deficit would indicate a fewer number of minutes for a greater number of players, which may be due to trades during the season, a large amount of lineup experimentation trying to find a lineup that works best, or a number of injuries to regular starters a good bit of the way into the season.  Either way, it is indicative of a team that has a greater number of players playing a lower amount of time, which affords less time for the team to gel.

Whatever the reason, the model indicates there is a sweet spot for squad rotation.  Players need to be rotated through to encourage a distribution of work and raise the IQR, but a core group of players getting a greater number of minutes bodes well for team success in the playoffs.

A Comprehensive Model For the Conference Semifinals


While visualization is easier in two dimensions, the reality is that the BLR model for conference semifinal odds is actually expressed via four dimensions.  Given that the coefficients for the median and IQR minutes played factors are relatively similar, a single three dimensional graph of the model's behavior against the game differential factor and median minutes played factor can express how odds change with the combination of factors.  Such a graph is shown below.


The lessor effect of difference in played minutes can be observed at the extreme ends of the game differential axis .  The graphical representation appears as a small droop as one comes forward in the graph at a game differential of -15, while a small rise in odds can be seen as the difference goes positive at a game differential of +15.  Readers can imagine a compounding effect of droop/rise when both player minute factors are added into the equation.

As the game differential factor approaches zero, the effect of player minute differential on the odds of winning a conference semifinal become more pronounced.  In fact, at a game differential of zero the odds of winning the conference semifinal collapse to the two dimensional graphs shown earlier in this post.

The result is that between 0 and either +15 or -15, the odds of winning an MLS conference semifinal matchup take on what may be best described as a "twisted S curve" shape.  Again, the level of compounding of the twist in the S-curve is dictated by how far each factor's value is away from zero.

Conclusions

The BLR model for MLS conference semifinal odds has been updated to account for factors that take into account squad rotation, long term player injury, and squad churn.  It now measures both overall team fatigue (number of matches) and squad management (player minutes).  Such a comprehensive model allows for an evaluation of how well teams have done from 2003 onward given the fatigue and rotation with which they started their playoff run.  Additionally, the single elimination BLR (2011 wild card + 2003-2011 conference finals) should be re-examined in light of the player minutes data now available.  That topic will be covered in a future post that continues to focus on the factors that contribute to MLS playoff success.

No comments:

Post a Comment