Thursday, January 26, 2012

Sloan Sports Analytics Conference: Putting a Human Element to Soccer Analytics

The following is a review of the two previous soccer-specific panels at the Sloan Sports Analytics Conference (SSAC) as part of my coverage of the 2012 conference for Howler magazine.

The genius of the Oscar-nominated Moneyball isn’t in the numbers or the Oakland A's resultant success.  That part of the story is well understood.  The genius is in the story the numbers help tell, one filled with the personal emotion and organization turmoil that change sports analytics can bring with them.  The fear of personal and organizational failure, of isolation by both the analytics practitioners and the scouts they’re trying to replace, and the trust that must be built between the team’s management team and the analyst.  It’s the deep human and organization insight that makes this story different than a story simply about the numbers.  There is a moment about twenty-five minutes into the movie where all of human themes come together.

Billy Beane is sitting in his dining room late into the night, struggling to determine whether or not he is going to bet his professional reputation and the Oakland A’s entire season on the theories of a classically trained economist with zero professional baseball experience.  Knowing that he and Peter Brand will be challenged at every turn due to the unconventional approach they would take, Beane needs to understand if Brand will be a change-agent even when it is not convenient.  Beane picks up the phone and calls Brand so late into the evening that neither of them even knows what time it is.

Once pleasantries have been dispensed, Beane decides to immediately test his potential assistant’s mettle.
“Would you have drafted me in the first round?”
Caught off guard, Brand responds with the platitude.

“You were a good player...”
The response does not impress Beane.  He’s not just looking to be patronized, having washed out of the big leagues after a promising high school career.  He wants to see if the potential assistant GM, who confirms he’s looked up Beane’s stats after their initial run-in at the Cleveland Indians’ headquarters, has the ability to quickly analyze a situation and provide a cogent answer.

“Cut the crap, man...  Would you have drafted me in the first round?!?”
Brand now understands this isn’t a courtesy call, and responds to Beane’s probing question.

“I’d have taken you in the ninth round, no signing bonus.  I can imagine you would have passed and taken the scholarship [that you turned down from Stanford].”
At that moment the personal connection is made.  Beane cannot only count on Brand to be honest with him, but that his analysis will be spot on while not sacrificing the human element.  He offers Brand the job, and the rest is baseball history.

may be a movie about how analytics changed the baseball world, but the themes within the movie are replicated in the soccer analytics world.  Sifting through piles of data to find the numbers that matter to an individual club.  Requiring every facet of a club to buy into the overall team vision shaped by what the numbers tell the management team.  Balancing the human intuition of a manager/coach with the computationally intensive models from the analytics department.  The inevitable success at poorer clubs leading to bigger opportunities at richer clubs.  Similar themes are present in soccer, with unique twists due to the differences in the two sports' financial and organizational structures.

Soccer analytics doesn’t have its equivalent of Moneyball.  No disrespect is meant to the very good book Soccernomics, but it has little of the human insights that Moneyball has.  In an interview I conducted of Simon Kuper late last year, he pointed out that we have yet to see concrete examples of where soccer analytics has made a difference yet we know clubs are using data to make decisions.  Thus we have no obvious success story about which to tell a compelling human story.   Without a Moneyball for soccer, where can soccer analytics enthusiasts turn for the personal stories and latest theories from the club’s practicing soccer analytics?  One outstanding resource is the annual Sloan Sports Analytics Conference hosted each March by MIT’s Sloan School of Management.

On March 2nd and 3rd, 2012 nearly 2,000 people will descend on the Hynes Convention Center for the 2012 edition of the conference (tickets are still available). It will be the sixth iteration of a conference that started with 175 participants and 11 panels, and has grown over five years to have more than 1500 attendees and 20 panels.  Closely following the explosion in attendance has been the role soccer analytics has played within it.  The Sloan Conference hosted it’s first soccer-related panel in 2010 as part of a joint presentation with American football panelists on emerging analytics. In 2011 the sport moved to its own dedicated panel, and the 2012 conference will have yet another soccer-specific panel. The move from side show to a stand-alone panel that is now the number six most viewed SSAC video is no coincidence.

What was once the exclusive purview of the likes of the Milan Lab and a certain French manager in the Premier League exploded into the soccer literary world in late 2009 with the publication of Simon Kuper’s and Stefan Szymanski’s Soccernomics.  With clubs looking for unique insights due to competition for limited resources, the field of soccer analytics has become a core competency of teams competing for league championships.  The panels within SSAC provide unique insights on the challenges and opportunities presented by soccer analytics from the top practitioners in the game, with many of the themes seen in Moneyball also being present in their discussions.

Billy Beane is sitting in an Oakland A’s office with a whiteboard and computer screen full of data in front of him.  Peter Brand, the recently hired assistant GM who produced all the data, is educating Beane on the analytics framework he is creating for the A’s.  After overwhelming Beane with reams of data and even the computer code that generated it, Brand summarizes the theory.

“It’s about getting things down to one number.  Using stats the way we read them, we’ll find value players that nobody else can see.”
The challenge of developing analytical solutions to soccer’s tactical options is well understood.  Unlike games like baseball or football that have well-defined positions and a clear sequential nature to player contributions, soccer is a more fluid game whose continual action makes it tough to categorize individual contributions.  The positional data is now readily available to teams, a point made repeatedly by the participants in the 2011 panel.  In fact, Blake Wooster summarized that the common problem now facing clubs is too much data and not enough resources nor time to analyze it. The top teams who most need data analyzed are playing every three to four days when domestic and international cup competitions are included. Trying to turn around analysis on the previous match, provide insights on the opposition in the next match, and continually refining the basic models underpinning all of the analysis are tasks that take more than a few days to complete.  One would think a natural solution would be to turn to the budding public soccer analytics community, harnessing the power of distributed resources in a way that baseball and basketball have already done.

There are two major barriers to such fan involvement in soccer.  First, unlike North American professional sports where the leagues own the data and make it relatively available, the teams own the raw analytics data in European soccer.  Gavin Fleig pointed out in the 2011 panel that there is no single league source to which all teams have access, nor is their a desire to share such data with the common fan.  Any league wide data that is available has been monetized, providing yet another barrier to contributing to the sport’s numerical understanding by the recreational hobbyist of limited financial means.  Second, given the fact that teams pay for much of the raw information themselves they are keen to keep it to themselves for fear of “giving away trade secrets”.  The paradox is that the restriction that comes from such lack of sharing makes timely analysis nearly impossible.  All of this means that utilizing the distributed resources of fans and supporters who are statistically inclined has found limited acceptance amongst clubs and leagues.

Even in the face of this resistance there is a growing number of analytically inclined fans getting access to data.  Statistics groups like Opta will publish their data to users for a fee, and writers can find purchasing power in numbers.  One such user is the EPL Index, who makes the data available to subscribers and site authors in exchange for defraying the cost of the data purchase.  For a fee as small as £39.95 a year users can get access to a full suite of Opta stats via EPL Index.  Writers interested in contributing material to the site can get access to the data for free.  This model has led to 40 writers contributing 60 articles per month, with more than 150,000 site visits each month.  Aggregators like the EPL Index demonstrate a growing appetite by soccer fans to describe the game they love numerically, even if the clubs are unwilling to engage them en masse.  Soccer clubs and leagues would be wise to leverage such interests, especially given the sport's growth potential in the statistically inclined United States.

An unintended benefit of such fan involvement might be the wider agreement on the analytical measurements that matter within the game.  In both the 2010 and 2011 conference panels the club representatives commented on the difficulty in player valuation across multiple leagues and varying data sets.  Unlike US sports players are not traded in club soccer, but instead have their rights of employment transferred from one club to another via a monetary transaction.  Clubs in the English Premier League are spending hundreds of millions of pounds (net) every season on transfers, and constantly desire better valuation methods to ensure a better return on investment.  Compounding the financial problem is a cultural one.  In the Premier League alone Mike Forde, 2010 panelist, observed that more than 50% of the players are not from the UK and hail from a total of 62 countries.   Language barriers are the least of these club’s concerns when it comes to culture.  Classification of shots-on-goal and passing data is not only defined differently, but the method of its recording and thus the quality of the data is questionable from league to league.

Even when data is available, translating a player’s performance from a mid-level league (the Dutch Eredivisie, for example) to a top-tier league (the English Premier League) can be difficult.  Soccer is a game of eleven players facing another team of eleven players.  An individual player’s teammates and the opposition greatly affect his or her performance.  Finding robust metrics that can be compared effectively across leagues requires tens to hundreds of thousands of computation-hours and constant international interaction, something beyond the capability of clubs alone.  Conferences like SSAC can provide a small bit of assistance in moving the soccer analytics community to a more complete and standardized understanding of the game.

Like any other multinational marketplace, standardization would greatly benefit the soccer analytics world.  Where languages, emotions, and historical divisions can fail numbers may help. English may be the lingua franca of our time, but numbers are the true universal language.  Standardization around an analytics framework and the factors that mattered to the game would help make transfer fees a bit more rational.  It would provide teams with a more reliable way to evaluate players, and avoid the high washout rate seen today with transfers.  At its heart, standardization requires public demonstrations of success and failure via analytics.  The easiest way to do this is greater fan involvement in the analytics community, which will lead to a few bold clubs to use the theories and demonstrate success with them.

So if fan involvement isn’t currently a priority for clubs, what will drive a greater use of analytics within the game?  It will be the business realities facing it.

Billy Beane is flown to Boston after the successful 2002 season in Oakland to meet with new Red Sox owner John Henry.  Henry’s background in analytically-driven investment decision making makes him an early adopter of the revolution Beane is introducing to the game of baseball.  While the Red Sox’s were not nearly as financially constrained as the A’s, they were looking for unique insights to finally break the Curse of the Bambino that had led to a then-84-year championship drought.  Henry wants Beane to be his GM, and is willing to pay any sum of money to get him.

“[A’s owner] Steve [Schott] is offering you a new contract, so why did you return my call?”
Beane’s answer says it all.

“Because it’s the Red Sox.  Because I believe science may offer an answer to the Curse of the Bambino.  Because I hear you’ve hired Bill James.”
While managing the A’s payroll and talent has been fun, the prestige of taking a big club to its first championship in nearly a century may be more attractive.  Beane then observes that Henry is going to anger the rest of the league by hiring an outsider like James.  Henry then responds with commentary on the luxury of being a richer club and owner than most.
"One of the great things about money is that it buys a lot of things.  One of which is the luxury to disregard what baseball likes and doesn't like"
Much like the Oakland A's in baseball, smaller soccer clubs have always had to find creative ways to match the superior financial resources and inherent attractiveness of bigger clubs.  Ian Graham of Decision Technology recounts how fellow 2011 panelist Gavin Fleig’s tenure with Sam Allardyce at Bolton led to such creative team management and resultant over achievement.  Graham’s analysis indicated that by most conventional metrics, Bolton was 17th in terms of team quality yet 7th overall in the table.  This was because Fleig and Allardyce found players who were trained to take unusual advantage of corners and free kicks, normally low percentage goal scoring opportunities for most teams.  Who else helped that Bolton effort? 2010 SSAC panelist Mike Forde.

While Fleig and Forde have backgrounds at Bolton and modern day success stories exist at club’s like AZ Alkmaar in the Dutch Eredivisie, the reality is that top talent on the pitch and in the analysis room eventually moves on to bigger clubs.  Unlike Beane, the SSAC panelists couldn’t say no to the allure of taking their analytics insights to bigger clubs and being given the backing of deeper-pocketed owners.  Overall, the Sloan Conference’s panel participants from the Premier League have been from two clubs not known for paying minimal fees for hidden talent - Chelsea and Manchester City.  It is a bit ironic that representatives from such clubs are included in the panel, given that their clubs have set records for annual losses while amassing the two costliest clubs from a transfer fee perspective.  It's not to say that investing heavily for rapid success is incongruous with an analytical approach, but it does beg the question as to how much analytics are being used for player transfer selection, valuation, and wages.

Large clubs are going pay more for the players they desire. They seek better players that are typically on multiple big clubs’ radars, thus raising the player’s transfer fee they can demand.  Steven Houston from Chelsea FC admitted as much in the 2011 panel when he pointed out that analytics are for identifying roll players and early performers at his club, not necessarily how much they would pay for internationally renowned players.  Managers like Arsene Wenger and the Arsenal board of directors may take principled stands to run balanced books by not paying expensive transfer fees and keeping wages under control, but it won’t win them a championship any time soon.  The reality is that one must pay dearly to compete at the top levels of the soccer world.

There are financial reasons for even the biggest clubs to adopt analytics.  At the 2010 panel the impact of the salary cap on the need for analytics in the NFL was explored.  Such a cap means that that teams must search for ways to get more from the same expenditures as everyone else.  Salary caps within each of the national leagues in Europe are impractical given the fractious nature of the multiple national leagues and the varying levels of income available to the individual clubs. Nonetheless, the risks associated with skyrocketing club debt within UEFA have led to the passage of the first set of comprehensive financial rules within the federation.  An analogy between US salary caps to the UEFA Financial Fair Play rules was made at the 2011 conference by Ian Graham, suggesting that they may make analytics more valuable in terms of rationalizing player wages and transfer fees.  Such rules may help, although authors like the Swiss Ramble have suggested that clubs like Chelsea and Manchester City may find enough loopholes in the rules to minimize the amount of change required of their current business models.  More rigorous enforcement of the Fair Play rules will place greater emphasis on analytics, although the extent to which they’re enforced remains to be seen.

In the United States, the financial situation within MLS is a bit different.  The league has been obsessed with containing player wage growth as part of an overall conservative business model since its inception.  This is based heavily upon the NASL’s financial implosion in the early 1980’s.  This has led  the league to adopt a relatively low cap with few exceptions.  The result has been steady growth in player compensation from year to year punctuated by large jumps in compensation whenever the collective bargaining agreement is renegotiated.  This would suggest MLS is ripe for a statistical approach to maximize the value of limited resources.  Unfortunately, the financial capabilities of most MLS clubs limit the ability to make such investments.

So if financial incentives may remain more prominent at the lower levels of a league rather than the upper echelons, what other incentives are their for the top performing clubs to use analytics?  One reason may be to instill a cultural permanence to compensate for a constantly shifting collection of players and managers.

Early in the 2002 season the A’s are at the bottom of the AL West and the Moneyball experiment is not working.  Beane has bought the players, but manager Art Howe won’t play some of them no matter what the numbers suggest. The numbers seem to go against every fiber of his intuition about the game. Without a few critical pieces on the field, the team is floundering.  Beane confronts Howe in his office.

“[Art], it doesn’t matter what moves I make if you don’t play the team the way they're designed to be played!” 
“[Billy], this is about you doing your job and me doing mine.”
 “I didn’t assemble this team for you, Art!”
Similar potential pitfalls for soccer analytics were outlined in both panels - the concept of not instilling a consistent culture by the club’s ownership and management team.  Football manager turnover has always been high, and can present a challenge for a club trying to instill a stable model of continual youth development supplemented by savvy purchases of players in the transfer market.  Chelsea alone has had eight managers in the nearly nine years since Roman Abramovich bought them.  A club without cultural stability in the face of such management change can face continual churn of players that always fit the last manager.  These players, who don’t fit the current manager’s system, represent wasted transfer fees and wages and make success extremely difficult.  Thus, analytics can be viewed as a way to mitigate player and manager investment risk.

The 2011 panel emphasized the concept of data as method for framing a club’s culture.  Gavin Fleig from Manchester City emphasized the importance of recruiting managers, and not just players, that buy into the culture and style of play that the data is attempting to create for the club.  Otherwise, a Moneyball-style clash between the manager and the boardroom is inevitable.  At the same time, Gary Wooster of ProZone offered the balanced opinion that the analyst must recognize their supplemental roll to the manager’s intuition.  Intuition can lead to faster insights into reasons for player success or failure - data is just there to confirm or refute them on a numerical basis.  Data takes a long time to generate and analyze, and it is often a 70/30 split in favor of data capture over analysis according to 2011 panelist Bruno Aziza.  Realistic expectations for data's use are key to ensuring the correct balance between human and data-driven decision making.  Bias too much towards manager intuition, and the likely outcome is player and tactics churn.  Too much bias towards data can lead to slow decision making and player or manager resentment.  In Moneyball even Beane and Brand had to recognize the value inexpensive locker room soda represented to the players, and in doing so their statistical approach to baseball was softened with a bit of empathy. A club’s owners should desire leadership that strikes such a balance, an approach that is not slavishly driven by a manager’s personality nor an analyst’s data.

Like Brand’s assessment of Beane-as-player in Moneyball, one of the biggest benefits of analytics for club management is the power to say “no.”  Owners and managers don’t reach their positions in life without having outsized personalities and opinions.  Sometimes those traits can get the better of the manager’s or owner’s judgement, leading them to consider an investment that can’t possibly be supported by the numbers.  The ability readily call upon and trust the numbers to refute an ill-advised transfer request can save a club tens of millions of pounds.  Who now doubts that Chelsea would love to take back the £50 million they payed Liverpool for Torres, while Liverpool would simultaneously love to take back the £35M of the Torres fee they immediately invested in Andy Carroll?  Perhaps there is no amount of data that would have convinced their management teams to act differently, but perhaps it would have at least caused them to pause and think before making such overpriced investments.

Mike Forde spelled out how Chelsea use this "power of no" at the 2010 conference.  He pointed out that it is very easy for clubs to pay for a marquee player for which they are not ready.  Forde and Chelsea’s approach is to put themselves in the shoes of the top five or six players already at Chelsea, and approach the player evaluation the way those players would.  The questions asked of the player, his agent, and the former club reinforce that the team dynamic is key, and that buyer’s remorse in soccer is a bigger risk than many like to admit.  Constantly thinking of the way an potential transfer may mess up team chemistry is one way to avoid making a costly mistake.

It is a club’s culture that matters most in attracting new talent.  Whether it’s using analytics to build a team that outperforms its financial means or simply to maintain a club’s historical position at the top of the table, the personal connection between players, manager, and the club are what sustains high performing teams. The analytics groups at some of the biggest clubs in Europe are no bigger than six to eight analysts.  In MLS, it may be a team of one for most clubs.  Analytics plays a supplementary role rather than a defining one in soccer today.  It’s presence is growing, being driven by an increasingly global competition for managerial and player talent.  Ironically, it’s analytics that are trying to bring forward a clearer definition of the human actions that contribute to team success.  It’s those human stories that we’re most interested in.  Imagine the power of understanding what really makes Messi one of the greatest ever, why the Spanish national team is so successful on the pitch, or why Robin Van Persie is having such a good run of form lately.  It wouldn’t subtract from the beauty of the game.  It would only add to it, and would help us understand how such human or collective greatness is achieved.  It’s that human element that makes the numbers so compelling.

As I write this article the insights from the last conference are already a year old, and analytics trends and theories change much faster than that.  It will be interesting to see what themes and personal stories are brought to the 2012 SSAC soccer panel.  Beyond the panels themselves, the conference offers participants an opportunity to get away from their analytical models and connect with other likeminded sports analysts and lay statisticians.  The human element extends beyond the conference itself, and into the connections that are made and grown well after the conference ends.  If the previous two conferences with soccer analytics panels are any indication, the 2012 edition will be full of things we won’t find in any soccer statistics book.

Sunday, January 22, 2012

Changes Coming...

I am in the process of making several changes to the blog.  The first one everyone will be able to notice in the next day or so is a blog-specific domain name.  I have registered via Google Apps.  They have stated it may take up to 48 hours for the domain to show up, but after that point all references to the current URL will redirect to the new one.  All of the links on the existing blog posts will still work just fine, so the change should be transparent to everyone.

I am unsure of the impacts to anyone who is utilizing an RSS feed of the blog.  The documentation I can find does not state whether or not your current RSS feed will still work in the future.  The safest bet is to just sign up for a new one once the new URL has taken effect.  It should populate with all of my prior posts as they carryover to the new domain.

In a few weeks I'll also be making some stylistic changes to the blog background and layout.  It shouldn't be anything major, but it has been two years since I launched the blog and did any updates to its imagery. Hopefully everyone finds them agreeable.

Friday, January 20, 2012

An Alternative History of Premier League Champions

A post long time coming has finally gone live over at the Transfer Price Index that re-orders all 19 seasons worth of Premier League tables based upon transfer fees.  The model used to generate the data for the post is the same m£XIR model I used extensively in the second half of last year for posts analyzing Aston Villa, David Moyes, and Arsene Wenger.

Statheads should find the full alternative table by season very entertaining.  They'll also find the number of m£XIR championships won by Manchester United, even when taking transfer fees into effect, to be a bit too much to believe.  Readers should keep in mind that the majority of these came during the era of Beckham, Giggs, Scholes, and the Nevilles and that therefore they had an abnormally low overall transfer expenditure for their success.  While United's total number of championships is reduced, they still remain the overall leader.

Liverpool and Arsenal fans will be pleased to see how well their clubs do when transfer spending is taken into account when evaluating success.  The more rare Nottingham Forest fan will also find something to like in this post.  Along with each season's rankings I've also provided a compilation of all time best single season performances regardless of which season they occurred within.

I'll follow up next month with another post in the m£XIR series where I look at each manager's overall PPM differential to the m£XIR model.  This will allow for the evaluation of each manager's complete tenure versus others.

Thursday, January 12, 2012

The Impact of Player Minutes on MLS Conference Semifinal Success

I've been writing for nearly a year about how MLS teams that play in a greater number of competitive matches outside of league play end up paying a penalty when it comes to MLS's playoff format.  It's a penalty that is unique to the conference semifinals that use a two-legged aggregate goal format, whereas the conference finals that were single elimination through last year do not see such a penalty for teams who play a greater number of extra-MLS matches.  Several readers of those posts had asked that a more refined metric be studied - one that actually quantified the number of minutes played by each playoff team or the individual players within it.  While more games can certainly be understood as more taxing on teams come playoff time, there is a difference in how clubs can approach such extra demands.  Improved player rotation in non-MLS competitions can minimize the wear-and-tear on the preferred starting XI that are utilized in the playoffs.  The key to such an analysis is finding a comprehensive database of MLS playing time statistics.

Just such a database exists via a publicly available resource.  Climbing the Ladder's MLS Player Lineup Database has maintained a record of every match's scoreline, in which competition it occurred, the players who participated in each match, time of any substitutions, as well as goal scorers and those who assisted in the goals.  It's a gold mine for MLS statheads, but it does require some post processing to make full use of the data that is contained in text strings within its Excel files.

Elijah at the Climbing the Ladder blog was gracious enough to provide me with a pre-release of the 2011 database so that all seasons from 2003 forward could be included in my analysis.  I've partnered with Sarah Rudd at to put the data into a more database-query friendly format as well as clean up a few misspelled names.  Sarah's also written about the concept of playing time management and squad rotation, so it was only natural that she and I partner to explore the concept further via the CTL database.

Our post processing allows us to look at player minutes over an entire season with all extra competitions included (US Open Cup, SuperLiga, CONCACAF Champions Cup/League, etc.).  This post will utilize the data on minutes played prior to the conference semifinal round to see if any additional predictors of playoff success can be realized.  Subsequent posts will look at similar effects on later playoff rounds.

Methodology and Model

Like previous posts, a binary logistic regression (BLR) model was used to estimate the likelihood of a team winning in the conference semifinal round of the playoffs.  The factors included in the study were:

  • Season Series PPM Differential
  • Seed Difference
  • Season Series Goal Differential
  • Difference in Coaching Experience
  • Total Number of Games Played Differential
  • Final 5 Games Point Differential
  • Difference in Season Goal Differential
  • Median Minutes Played Differential
  • Interquartile Range (IQR) Minutes Played Differential
The minutes played, difference in coaching experience, and total number of games played data comes from the CTL database.  All other data comes from publicly available sources like the MLS website.  The median and IQR values for minutes played from each club are used in lieu of the mean and standard deviation as the data associated with player minutes over a season is not normally distributed.

There's always the risk of introducing multi-colinearity when adding new variables to a regression model.  This is especially the case when looking at the number of matches (or the differential in matches between two teams) and the number of minutes players accrue through the season (or the differential in minutes between two teams).  It's intuitive to think there is a correlation between the two.

It turns out the median and IQR minutes played differentials are correlated to the number of games played (statistically significant with p-values much less than 0.05), although the relationship between the two is relatively week (see R-squared values in graphs below - click graph to enlarge).

This doesn't necessarily mean the model is co-linear and therefore of little value.  In fact, the effect may be quite the opposite.  It's quite conceivable that the greater spread of minutes (higher IQR) may lead to higher odds of playoff success, while it's already understood that a larger number of games is detrimental to those odds.  What matters is testing the outcome of the model for multi-colinearity.  Such tests showed multi-colinearity did not exist.  The mild linear relationship between the minutes played statistics and the number of games played is an item of interest that will be explored further.

There are also two ways to look at player data that would produce very different estimates for the median and IQR values.  One method utilizes data from all players who played at least one minute at a club over all competitions within a season to measure the overall effect of the matches on player time distribution.  Another method utilizes only the data for players who appear in the semifinal round of the playoffs - a reduced data set from the first method that looks at the impact to only those players who featured in the playoff round.  As the desire is to have a predictive model for future years of playoffs, the first method was used as it does not rely on a projection of starting lineups to make a prediction.  It also accounts for things like mid-season trades, injuries, and player churn that may greatly affect the time a player spends on the pitch for a specific club prior to the start of the playoffs.  This is key to understanding how much playing time a player may have had to work with the rest of his teammates.

In re-running the BLR model for all of the factors listed above, only the following three were found to be statistically significant (p-value > 0.05):
  • Games Played Difference
  • Median Minutes Played Differential
  • IQR Minutes Played Differential
The results of the BLR are displayed in the table below.  The median and IQR minute differential factors have been divided by 100 to put them on a scale similar to the game difference factor and provide for a better comparison of odds ratios.

The results of the BLR model demonstrate that the game difference factor continues to have the biggest impact on conference semifinal odds.  The odds ratio communicates the percent change in odds when moving one unit of measure up or down (game difference or every 100 minutes of playing time for a player).  The odds ratio of each minutes played factor communicates that each increase in median and IQR differential units (in this case, every 100 minutes) leads to a 31% and 27% increase in the odds of winning the conference semifinal, respectively.  Conversely, a unit increase in game difference leads to a 39% decrease in odds, or a 64% increase in odds for each fewer game played.

Now that the relative impact of different factors on the change in odds of winning a conference semifinal are understood, what is the impact on the actual odds of winning the conference semifinal?  For such an analysis the complete BLR model, which simultaneously evaluates the impact of all three variables, must be evaluated.

The Impact of Club Median and IQR Minutes Differential

The BLR model is four-dimensional, which makes visualizing its behavior a bit difficult.  A simpler way to view the data is to hold all other variables constant at a value of zero and then plot the changing odds as a function of the variable of interest.  Such a plot is made below for the median and IQR minutes differential factors.

The graph puts numbers to the trends described above for the BLR model's odds ratios.  The trend of increasing odds with increasing IQR makes sense.  Increasing IQR indicates a bigger spread in the distribution of playing minutes to individual players throughout the season, which could be indicative of effective player rotation.  The trend of increasing odds with increasing median minute differential may not be as intuitive given that it might correlate with the number of games played and thus make us think players are being overused.  However, this neglects the reasons for the opposite effect of a median minutes differential deficit.  Such a deficit would indicate a fewer number of minutes for a greater number of players, which may be due to trades during the season, a large amount of lineup experimentation trying to find a lineup that works best, or a number of injuries to regular starters a good bit of the way into the season.  Either way, it is indicative of a team that has a greater number of players playing a lower amount of time, which affords less time for the team to gel.

Whatever the reason, the model indicates there is a sweet spot for squad rotation.  Players need to be rotated through to encourage a distribution of work and raise the IQR, but a core group of players getting a greater number of minutes bodes well for team success in the playoffs.

A Comprehensive Model For the Conference Semifinals

While visualization is easier in two dimensions, the reality is that the BLR model for conference semifinal odds is actually expressed via four dimensions.  Given that the coefficients for the median and IQR minutes played factors are relatively similar, a single three dimensional graph of the model's behavior against the game differential factor and median minutes played factor can express how odds change with the combination of factors.  Such a graph is shown below.

The lessor effect of difference in played minutes can be observed at the extreme ends of the game differential axis .  The graphical representation appears as a small droop as one comes forward in the graph at a game differential of -15, while a small rise in odds can be seen as the difference goes positive at a game differential of +15.  Readers can imagine a compounding effect of droop/rise when both player minute factors are added into the equation.

As the game differential factor approaches zero, the effect of player minute differential on the odds of winning a conference semifinal become more pronounced.  In fact, at a game differential of zero the odds of winning the conference semifinal collapse to the two dimensional graphs shown earlier in this post.

The result is that between 0 and either +15 or -15, the odds of winning an MLS conference semifinal matchup take on what may be best described as a "twisted S curve" shape.  Again, the level of compounding of the twist in the S-curve is dictated by how far each factor's value is away from zero.


The BLR model for MLS conference semifinal odds has been updated to account for factors that take into account squad rotation, long term player injury, and squad churn.  It now measures both overall team fatigue (number of matches) and squad management (player minutes).  Such a comprehensive model allows for an evaluation of how well teams have done from 2003 onward given the fatigue and rotation with which they started their playoff run.  Additionally, the single elimination BLR (2011 wild card + 2003-2011 conference finals) should be re-examined in light of the player minutes data now available.  That topic will be covered in a future post that continues to focus on the factors that contribute to MLS playoff success.

Tuesday, January 3, 2012

Guest Post at The Ball Is Round: My First Game

You Never Forget Your First Time

I recently added The Ball Is Round blog to my RSS feeds to get some non-statistical balance to my soccer blog library.  They have a neat little subsite titled  My First Game where fans the world over can submit recollections of their first live attendance at a soccer match.

The guys at the site were nice enough to post my recounting of my first Seattle Sounders FC match in October 2009.  It seems so long ago with how much soccer I've watched and written about since then, but in reality it was only slightly more than two years ago.  Writing about such an event helped force me to take stock of what I've witnessed over the last three years of watching the Sounders on TV and in person (my first match came after a full season of watching them on TV).

What started as an obsession my wife and parents thought would die out after a few months has pretty much become a lifestyle choice.  I now read dozens of blog posts a week, write a few myself every month, purchase software to aid in statistical analysis of the game, dedicate hundreds of dollars towards Sounders tickets every year, and am attempting to add "published soccer writer" to my resume.  Something this dominant in my life is approached with great seriousness.

While all of these changes in my life are things I appreciate every day, it is always good to be reminded why I dedicate so much of my time and resources to analyzing and writing about the sport.  It's because I absolutely love standing for 90+ minutes in a stadium full of supporters observing a game that is simultaneously both simple and beautifully complicated.  It's a passion I have been able to share with my wife and two daughters, one that bonds random people on the street into a family of 36,000 at each Sounders match.  For Seattleites, it is a rare experience to behold given our awful 40-year relationship with professional sports.

Simon Kuper repeatedly reminds us to take the game a lot less seriously because that's what it really is - a game that doesn't really matter that much in the grand scheme of life.  Sites like The Ball is Round and their My First Game subsite help provide a way for doing just that.  Any words worth writing should be approached with at least a mild degree of seriousness, but providing a forum for recounting the child-like appreciation we had at our first match is a way to utilize such seriousness to enjoy the game for what it is.

I highly encourage you check out The Ball is Round site, and would love to hear recommendations as to other sites that provide similar outlets for writing.