Thursday, January 26, 2012

Sloan Sports Analytics Conference: Putting a Human Element to Soccer Analytics

The following is a review of the two previous soccer-specific panels at the Sloan Sports Analytics Conference (SSAC) as part of my coverage of the 2012 conference for Howler magazine.

The genius of the Oscar-nominated Moneyball isn’t in the numbers or the Oakland A's resultant success.  That part of the story is well understood.  The genius is in the story the numbers help tell, one filled with the personal emotion and organization turmoil that change sports analytics can bring with them.  The fear of personal and organizational failure, of isolation by both the analytics practitioners and the scouts they’re trying to replace, and the trust that must be built between the team’s management team and the analyst.  It’s the deep human and organization insight that makes this story different than a story simply about the numbers.  There is a moment about twenty-five minutes into the movie where all of human themes come together.

Billy Beane is sitting in his dining room late into the night, struggling to determine whether or not he is going to bet his professional reputation and the Oakland A’s entire season on the theories of a classically trained economist with zero professional baseball experience.  Knowing that he and Peter Brand will be challenged at every turn due to the unconventional approach they would take, Beane needs to understand if Brand will be a change-agent even when it is not convenient.  Beane picks up the phone and calls Brand so late into the evening that neither of them even knows what time it is.

Once pleasantries have been dispensed, Beane decides to immediately test his potential assistant’s mettle.
“Would you have drafted me in the first round?”
Caught off guard, Brand responds with the platitude.

“You were a good player...”
The response does not impress Beane.  He’s not just looking to be patronized, having washed out of the big leagues after a promising high school career.  He wants to see if the potential assistant GM, who confirms he’s looked up Beane’s stats after their initial run-in at the Cleveland Indians’ headquarters, has the ability to quickly analyze a situation and provide a cogent answer.

“Cut the crap, man...  Would you have drafted me in the first round?!?”
Brand now understands this isn’t a courtesy call, and responds to Beane’s probing question.

“I’d have taken you in the ninth round, no signing bonus.  I can imagine you would have passed and taken the scholarship [that you turned down from Stanford].”
At that moment the personal connection is made.  Beane cannot only count on Brand to be honest with him, but that his analysis will be spot on while not sacrificing the human element.  He offers Brand the job, and the rest is baseball history.

may be a movie about how analytics changed the baseball world, but the themes within the movie are replicated in the soccer analytics world.  Sifting through piles of data to find the numbers that matter to an individual club.  Requiring every facet of a club to buy into the overall team vision shaped by what the numbers tell the management team.  Balancing the human intuition of a manager/coach with the computationally intensive models from the analytics department.  The inevitable success at poorer clubs leading to bigger opportunities at richer clubs.  Similar themes are present in soccer, with unique twists due to the differences in the two sports' financial and organizational structures.

Soccer analytics doesn’t have its equivalent of Moneyball.  No disrespect is meant to the very good book Soccernomics, but it has little of the human insights that Moneyball has.  In an interview I conducted of Simon Kuper late last year, he pointed out that we have yet to see concrete examples of where soccer analytics has made a difference yet we know clubs are using data to make decisions.  Thus we have no obvious success story about which to tell a compelling human story.   Without a Moneyball for soccer, where can soccer analytics enthusiasts turn for the personal stories and latest theories from the club’s practicing soccer analytics?  One outstanding resource is the annual Sloan Sports Analytics Conference hosted each March by MIT’s Sloan School of Management.

On March 2nd and 3rd, 2012 nearly 2,000 people will descend on the Hynes Convention Center for the 2012 edition of the conference (tickets are still available). It will be the sixth iteration of a conference that started with 175 participants and 11 panels, and has grown over five years to have more than 1500 attendees and 20 panels.  Closely following the explosion in attendance has been the role soccer analytics has played within it.  The Sloan Conference hosted it’s first soccer-related panel in 2010 as part of a joint presentation with American football panelists on emerging analytics. In 2011 the sport moved to its own dedicated panel, and the 2012 conference will have yet another soccer-specific panel. The move from side show to a stand-alone panel that is now the number six most viewed SSAC video is no coincidence.

What was once the exclusive purview of the likes of the Milan Lab and a certain French manager in the Premier League exploded into the soccer literary world in late 2009 with the publication of Simon Kuper’s and Stefan Szymanski’s Soccernomics.  With clubs looking for unique insights due to competition for limited resources, the field of soccer analytics has become a core competency of teams competing for league championships.  The panels within SSAC provide unique insights on the challenges and opportunities presented by soccer analytics from the top practitioners in the game, with many of the themes seen in Moneyball also being present in their discussions.

Billy Beane is sitting in an Oakland A’s office with a whiteboard and computer screen full of data in front of him.  Peter Brand, the recently hired assistant GM who produced all the data, is educating Beane on the analytics framework he is creating for the A’s.  After overwhelming Beane with reams of data and even the computer code that generated it, Brand summarizes the theory.

“It’s about getting things down to one number.  Using stats the way we read them, we’ll find value players that nobody else can see.”
The challenge of developing analytical solutions to soccer’s tactical options is well understood.  Unlike games like baseball or football that have well-defined positions and a clear sequential nature to player contributions, soccer is a more fluid game whose continual action makes it tough to categorize individual contributions.  The positional data is now readily available to teams, a point made repeatedly by the participants in the 2011 panel.  In fact, Blake Wooster summarized that the common problem now facing clubs is too much data and not enough resources nor time to analyze it. The top teams who most need data analyzed are playing every three to four days when domestic and international cup competitions are included. Trying to turn around analysis on the previous match, provide insights on the opposition in the next match, and continually refining the basic models underpinning all of the analysis are tasks that take more than a few days to complete.  One would think a natural solution would be to turn to the budding public soccer analytics community, harnessing the power of distributed resources in a way that baseball and basketball have already done.

There are two major barriers to such fan involvement in soccer.  First, unlike North American professional sports where the leagues own the data and make it relatively available, the teams own the raw analytics data in European soccer.  Gavin Fleig pointed out in the 2011 panel that there is no single league source to which all teams have access, nor is their a desire to share such data with the common fan.  Any league wide data that is available has been monetized, providing yet another barrier to contributing to the sport’s numerical understanding by the recreational hobbyist of limited financial means.  Second, given the fact that teams pay for much of the raw information themselves they are keen to keep it to themselves for fear of “giving away trade secrets”.  The paradox is that the restriction that comes from such lack of sharing makes timely analysis nearly impossible.  All of this means that utilizing the distributed resources of fans and supporters who are statistically inclined has found limited acceptance amongst clubs and leagues.

Even in the face of this resistance there is a growing number of analytically inclined fans getting access to data.  Statistics groups like Opta will publish their data to users for a fee, and writers can find purchasing power in numbers.  One such user is the EPL Index, who makes the data available to subscribers and site authors in exchange for defraying the cost of the data purchase.  For a fee as small as £39.95 a year users can get access to a full suite of Opta stats via EPL Index.  Writers interested in contributing material to the site can get access to the data for free.  This model has led to 40 writers contributing 60 articles per month, with more than 150,000 site visits each month.  Aggregators like the EPL Index demonstrate a growing appetite by soccer fans to describe the game they love numerically, even if the clubs are unwilling to engage them en masse.  Soccer clubs and leagues would be wise to leverage such interests, especially given the sport's growth potential in the statistically inclined United States.

An unintended benefit of such fan involvement might be the wider agreement on the analytical measurements that matter within the game.  In both the 2010 and 2011 conference panels the club representatives commented on the difficulty in player valuation across multiple leagues and varying data sets.  Unlike US sports players are not traded in club soccer, but instead have their rights of employment transferred from one club to another via a monetary transaction.  Clubs in the English Premier League are spending hundreds of millions of pounds (net) every season on transfers, and constantly desire better valuation methods to ensure a better return on investment.  Compounding the financial problem is a cultural one.  In the Premier League alone Mike Forde, 2010 panelist, observed that more than 50% of the players are not from the UK and hail from a total of 62 countries.   Language barriers are the least of these club’s concerns when it comes to culture.  Classification of shots-on-goal and passing data is not only defined differently, but the method of its recording and thus the quality of the data is questionable from league to league.

Even when data is available, translating a player’s performance from a mid-level league (the Dutch Eredivisie, for example) to a top-tier league (the English Premier League) can be difficult.  Soccer is a game of eleven players facing another team of eleven players.  An individual player’s teammates and the opposition greatly affect his or her performance.  Finding robust metrics that can be compared effectively across leagues requires tens to hundreds of thousands of computation-hours and constant international interaction, something beyond the capability of clubs alone.  Conferences like SSAC can provide a small bit of assistance in moving the soccer analytics community to a more complete and standardized understanding of the game.

Like any other multinational marketplace, standardization would greatly benefit the soccer analytics world.  Where languages, emotions, and historical divisions can fail numbers may help. English may be the lingua franca of our time, but numbers are the true universal language.  Standardization around an analytics framework and the factors that mattered to the game would help make transfer fees a bit more rational.  It would provide teams with a more reliable way to evaluate players, and avoid the high washout rate seen today with transfers.  At its heart, standardization requires public demonstrations of success and failure via analytics.  The easiest way to do this is greater fan involvement in the analytics community, which will lead to a few bold clubs to use the theories and demonstrate success with them.

So if fan involvement isn’t currently a priority for clubs, what will drive a greater use of analytics within the game?  It will be the business realities facing it.

Billy Beane is flown to Boston after the successful 2002 season in Oakland to meet with new Red Sox owner John Henry.  Henry’s background in analytically-driven investment decision making makes him an early adopter of the revolution Beane is introducing to the game of baseball.  While the Red Sox’s were not nearly as financially constrained as the A’s, they were looking for unique insights to finally break the Curse of the Bambino that had led to a then-84-year championship drought.  Henry wants Beane to be his GM, and is willing to pay any sum of money to get him.

“[A’s owner] Steve [Schott] is offering you a new contract, so why did you return my call?”
Beane’s answer says it all.

“Because it’s the Red Sox.  Because I believe science may offer an answer to the Curse of the Bambino.  Because I hear you’ve hired Bill James.”
While managing the A’s payroll and talent has been fun, the prestige of taking a big club to its first championship in nearly a century may be more attractive.  Beane then observes that Henry is going to anger the rest of the league by hiring an outsider like James.  Henry then responds with commentary on the luxury of being a richer club and owner than most.
"One of the great things about money is that it buys a lot of things.  One of which is the luxury to disregard what baseball likes and doesn't like"
Much like the Oakland A's in baseball, smaller soccer clubs have always had to find creative ways to match the superior financial resources and inherent attractiveness of bigger clubs.  Ian Graham of Decision Technology recounts how fellow 2011 panelist Gavin Fleig’s tenure with Sam Allardyce at Bolton led to such creative team management and resultant over achievement.  Graham’s analysis indicated that by most conventional metrics, Bolton was 17th in terms of team quality yet 7th overall in the table.  This was because Fleig and Allardyce found players who were trained to take unusual advantage of corners and free kicks, normally low percentage goal scoring opportunities for most teams.  Who else helped that Bolton effort? 2010 SSAC panelist Mike Forde.

While Fleig and Forde have backgrounds at Bolton and modern day success stories exist at club’s like AZ Alkmaar in the Dutch Eredivisie, the reality is that top talent on the pitch and in the analysis room eventually moves on to bigger clubs.  Unlike Beane, the SSAC panelists couldn’t say no to the allure of taking their analytics insights to bigger clubs and being given the backing of deeper-pocketed owners.  Overall, the Sloan Conference’s panel participants from the Premier League have been from two clubs not known for paying minimal fees for hidden talent - Chelsea and Manchester City.  It is a bit ironic that representatives from such clubs are included in the panel, given that their clubs have set records for annual losses while amassing the two costliest clubs from a transfer fee perspective.  It's not to say that investing heavily for rapid success is incongruous with an analytical approach, but it does beg the question as to how much analytics are being used for player transfer selection, valuation, and wages.

Large clubs are going pay more for the players they desire. They seek better players that are typically on multiple big clubs’ radars, thus raising the player’s transfer fee they can demand.  Steven Houston from Chelsea FC admitted as much in the 2011 panel when he pointed out that analytics are for identifying roll players and early performers at his club, not necessarily how much they would pay for internationally renowned players.  Managers like Arsene Wenger and the Arsenal board of directors may take principled stands to run balanced books by not paying expensive transfer fees and keeping wages under control, but it won’t win them a championship any time soon.  The reality is that one must pay dearly to compete at the top levels of the soccer world.

There are financial reasons for even the biggest clubs to adopt analytics.  At the 2010 panel the impact of the salary cap on the need for analytics in the NFL was explored.  Such a cap means that that teams must search for ways to get more from the same expenditures as everyone else.  Salary caps within each of the national leagues in Europe are impractical given the fractious nature of the multiple national leagues and the varying levels of income available to the individual clubs. Nonetheless, the risks associated with skyrocketing club debt within UEFA have led to the passage of the first set of comprehensive financial rules within the federation.  An analogy between US salary caps to the UEFA Financial Fair Play rules was made at the 2011 conference by Ian Graham, suggesting that they may make analytics more valuable in terms of rationalizing player wages and transfer fees.  Such rules may help, although authors like the Swiss Ramble have suggested that clubs like Chelsea and Manchester City may find enough loopholes in the rules to minimize the amount of change required of their current business models.  More rigorous enforcement of the Fair Play rules will place greater emphasis on analytics, although the extent to which they’re enforced remains to be seen.

In the United States, the financial situation within MLS is a bit different.  The league has been obsessed with containing player wage growth as part of an overall conservative business model since its inception.  This is based heavily upon the NASL’s financial implosion in the early 1980’s.  This has led  the league to adopt a relatively low cap with few exceptions.  The result has been steady growth in player compensation from year to year punctuated by large jumps in compensation whenever the collective bargaining agreement is renegotiated.  This would suggest MLS is ripe for a statistical approach to maximize the value of limited resources.  Unfortunately, the financial capabilities of most MLS clubs limit the ability to make such investments.

So if financial incentives may remain more prominent at the lower levels of a league rather than the upper echelons, what other incentives are their for the top performing clubs to use analytics?  One reason may be to instill a cultural permanence to compensate for a constantly shifting collection of players and managers.

Early in the 2002 season the A’s are at the bottom of the AL West and the Moneyball experiment is not working.  Beane has bought the players, but manager Art Howe won’t play some of them no matter what the numbers suggest. The numbers seem to go against every fiber of his intuition about the game. Without a few critical pieces on the field, the team is floundering.  Beane confronts Howe in his office.

“[Art], it doesn’t matter what moves I make if you don’t play the team the way they're designed to be played!” 
“[Billy], this is about you doing your job and me doing mine.”
 “I didn’t assemble this team for you, Art!”
Similar potential pitfalls for soccer analytics were outlined in both panels - the concept of not instilling a consistent culture by the club’s ownership and management team.  Football manager turnover has always been high, and can present a challenge for a club trying to instill a stable model of continual youth development supplemented by savvy purchases of players in the transfer market.  Chelsea alone has had eight managers in the nearly nine years since Roman Abramovich bought them.  A club without cultural stability in the face of such management change can face continual churn of players that always fit the last manager.  These players, who don’t fit the current manager’s system, represent wasted transfer fees and wages and make success extremely difficult.  Thus, analytics can be viewed as a way to mitigate player and manager investment risk.

The 2011 panel emphasized the concept of data as method for framing a club’s culture.  Gavin Fleig from Manchester City emphasized the importance of recruiting managers, and not just players, that buy into the culture and style of play that the data is attempting to create for the club.  Otherwise, a Moneyball-style clash between the manager and the boardroom is inevitable.  At the same time, Gary Wooster of ProZone offered the balanced opinion that the analyst must recognize their supplemental roll to the manager’s intuition.  Intuition can lead to faster insights into reasons for player success or failure - data is just there to confirm or refute them on a numerical basis.  Data takes a long time to generate and analyze, and it is often a 70/30 split in favor of data capture over analysis according to 2011 panelist Bruno Aziza.  Realistic expectations for data's use are key to ensuring the correct balance between human and data-driven decision making.  Bias too much towards manager intuition, and the likely outcome is player and tactics churn.  Too much bias towards data can lead to slow decision making and player or manager resentment.  In Moneyball even Beane and Brand had to recognize the value inexpensive locker room soda represented to the players, and in doing so their statistical approach to baseball was softened with a bit of empathy. A club’s owners should desire leadership that strikes such a balance, an approach that is not slavishly driven by a manager’s personality nor an analyst’s data.

Like Brand’s assessment of Beane-as-player in Moneyball, one of the biggest benefits of analytics for club management is the power to say “no.”  Owners and managers don’t reach their positions in life without having outsized personalities and opinions.  Sometimes those traits can get the better of the manager’s or owner’s judgement, leading them to consider an investment that can’t possibly be supported by the numbers.  The ability readily call upon and trust the numbers to refute an ill-advised transfer request can save a club tens of millions of pounds.  Who now doubts that Chelsea would love to take back the £50 million they payed Liverpool for Torres, while Liverpool would simultaneously love to take back the £35M of the Torres fee they immediately invested in Andy Carroll?  Perhaps there is no amount of data that would have convinced their management teams to act differently, but perhaps it would have at least caused them to pause and think before making such overpriced investments.

Mike Forde spelled out how Chelsea use this "power of no" at the 2010 conference.  He pointed out that it is very easy for clubs to pay for a marquee player for which they are not ready.  Forde and Chelsea’s approach is to put themselves in the shoes of the top five or six players already at Chelsea, and approach the player evaluation the way those players would.  The questions asked of the player, his agent, and the former club reinforce that the team dynamic is key, and that buyer’s remorse in soccer is a bigger risk than many like to admit.  Constantly thinking of the way an potential transfer may mess up team chemistry is one way to avoid making a costly mistake.

It is a club’s culture that matters most in attracting new talent.  Whether it’s using analytics to build a team that outperforms its financial means or simply to maintain a club’s historical position at the top of the table, the personal connection between players, manager, and the club are what sustains high performing teams. The analytics groups at some of the biggest clubs in Europe are no bigger than six to eight analysts.  In MLS, it may be a team of one for most clubs.  Analytics plays a supplementary role rather than a defining one in soccer today.  It’s presence is growing, being driven by an increasingly global competition for managerial and player talent.  Ironically, it’s analytics that are trying to bring forward a clearer definition of the human actions that contribute to team success.  It’s those human stories that we’re most interested in.  Imagine the power of understanding what really makes Messi one of the greatest ever, why the Spanish national team is so successful on the pitch, or why Robin Van Persie is having such a good run of form lately.  It wouldn’t subtract from the beauty of the game.  It would only add to it, and would help us understand how such human or collective greatness is achieved.  It’s that human element that makes the numbers so compelling.

As I write this article the insights from the last conference are already a year old, and analytics trends and theories change much faster than that.  It will be interesting to see what themes and personal stories are brought to the 2012 SSAC soccer panel.  Beyond the panels themselves, the conference offers participants an opportunity to get away from their analytical models and connect with other likeminded sports analysts and lay statisticians.  The human element extends beyond the conference itself, and into the connections that are made and grown well after the conference ends.  If the previous two conferences with soccer analytics panels are any indication, the 2012 edition will be full of things we won’t find in any soccer statistics book.