Statistics are just numbers

I spent the weekend in Portland at a friend’s wedding, and one of the side benefits was catching up with two friends and their new born son.

The husband and I are always chattering about soccer whenever we get together (especially Arsenal as we are both supporters), and the topic of my blog came up.

He’s a regular reader, and while he enjoys the blog he did feel compelled to mention the change in tone just prior to my recent trip to Europe.

He singled out this post for comment, inquiring why I moved away from statistical commentary and spoke with a different voice in asking for readers’ recommended books. I think such inquiry provides a perfect opportunity for me to clarify my purpose for this blog and comment on one of those recommended books.

First, this blog is an outgrowth of my belief that statistics can illuminate unseen relationships, but they cannot be dealt with independently of the wider context of the subject area they are being used within.

As my Portland friend commented earlier in that visit, we Americans are obsessed with statistics compared to other nations. Our obsession often leads us to look for patterns or significance where there are none. The mark of a good statistician is one who, in their normal course of study of a subject, stumble upon a question that may be best addressed with statistical analysis – that is, the use of statistics should be a rare occurrence in most fields. The same should be no different in soccer.

The challenge comes in building a blog around soccer statistics when my stated goal is to not be in search of patterns that don’t actually exist. To me, this blog is a journey in understanding the many facets of the world’s most popular sport. Statistics will greatly aid in that journey – to make sense of some larger truths in the sport.

However, statistics often only describe the most likely outcome of an average event and not the actual outcome of a specific one, which is why we play each and every match to determine the actual outcome. In the same manner, I see statistics only being part of my journey through the world of soccer.

Statistics related to the latest happenings in the soccer world will make up the bulk of my posts, but I also don’t want to lose touch with the human element of the game. Soccer shapes our human experience, and to a greater degree our human experience shapes the game of soccer.

I want to understand the humanity that produces the numbers I study. With that explanation, I now dive into one of those recommended books from my previous post.

At the urging of a regular reader/re-tweeter I picked up Brilliant Orange: The Neurotic Genius of Dutch Soccer by David Winner. It was a late addition to my reading material for my trip, and I was a bit sad I couldn’t get it on the Kindle as it would have kept my bag weight down.

Nonetheless, it came highly recommended and had the side benefit of helping describe the culture from which many of my co-workers come. After completing this book, I only wish I had read it earlier. It would have made my nearly three years of working with the Dutch much easier.

The book, more than anything, is a fascinating study of how such a small nation that outperforms expectations copes with the inevitable defeat it faces in major tournaments.

This over performance has been quantified in the Soccernomics model, where the Dutch score a half goal more per match than the model predicts and they sit 9th out of 49 European teams in this category used to gauge over-performance (see Figure 14.4 in Soccernomics).

The book starts at the beginning of post-war Dutch soccer to explain how coping with over-achievement became a Dutch soccer challenge.

To younger fans like me, Dutch soccer can be found everywhere today. Even Brilliant Orange acknowledges this, singling out Arsene Wenger’s Arsenal squads as one of the professional embodiments of Dutch “Total Football” outside of the Eredivisie.

This wasn’t the case in the early 1970’s. At that time, the Dutch were perfecting their version of soccer and unleashed it upon the world in Germany in 1974. At that World Cup the Netherlands began its run of over achievement. The heartbreaking loss in 1974 and the expected loss of 1978 set up the Dutch soccer story and was dealt with in the Dutch psyche as something to be accepted, largely tolerated, and in some ways celebrated.

The book goes through and weaves compelling stories of Dutch cultural impact on their style of soccer. Chapter 14 explains how Dutch land constraints and use lead to a different visualization of the soccer pitch, while Chapter 25 explains how their strategies of multiple uses of the same space on the pitch are also reflected in the unique layout and use of space at Schipol International Airport.

In Chapter 18, the author explains how Dutch collaborative democracy is a handicap for their soccer team, and how the atheistic Dutch are still shamed about outstanding achievement based upon their Calvinist cultural mores.

Such belief structures make accepting failure to win a championship all that much easier, and Chapter 6 explains how such democratic tendencies doomed the great national team of the 70’s.

Chapter 15 details the Dutch struggle with their role in the Holocaust, and how the adoption of the Jews by Ajax as a way to cope with that past has bred modern anti-Semitic chants from rival clubs.

Chapter 13 explains how anti-German feelings, which were largely absent in the Netherlands until the late-70’s, rose and fell in the 80’s and 90’s via the heated battles between the two nations’ soccer clubs.

One of the final chapters plays right into one of the themes from Soccernomics. In the chapter entitled “5 out of 6: Frank, Patrick, Frank, Jaap, Patrick, Paul… and Gyuri” (read the book to understand the chapter numbering system), author David Winner takes us behind the scenes of a debate within Dutch soccer: to win, or not to win, by penalty kicks. It seems that in their pursuit of playing their beautiful game, Dutch teams of the 80’s and 90’s felt winning by penalty kicks was beneath them.

If they couldn’t win in 90 or 120 minutes playing their game, they felt it was better to leave winning up to the chance of penalty kicks rather than a system for taking them.

While other teams were analyzing goalie behavior before matches and devising systems for taking penalty kicks, the Dutch weren’t even practicing penalty kicks let alone doing any preparation like the other teams.

They were convinced of the Soccernomics conclusion that penalty kicks didn’t change the likely outcome of the match based upon certain predictors before Soccernomics was ever published. The problem is that they came to the wrong conclusion.

We know that the conclusion in Chapter 6 of Soccernomics is that penalty kicks don’t have a statistically significant impact on the outcome of a soccer match vs. the predicted outcome from the Soccernomics model that looks at home pitch advantage, GDP, and population size of the two countries playing each other.

The simplification behind that statement is that the statistical test says penalty kicks have no impact on the average outcome – it doesn’t say much about specific outcomes.

In the case of the Dutch team, we already know they punch way above their belt when it comes to international competitions.

In their case, each game won on penalty kicks would have been another notch in their belt of over-achievement. The fact that they have likely conceded a number of matches due to lack of practice or respect for winning via penalty kicks means their over-achievement is likely higher than measured in Soccernomics.

I’d be interested to see if the Dutch team itself would have shown a statistically significant shift in wins or losses based upon matches that went to penalty kicks. Luckily, David Winner outlines the ongoing battle by many in the Dutch soccer program to emphasize penalty kick practice and strategies.

In summary, I’d highly recommend this book to anyone. It’s well written, very conversational, and strikes an outstanding balance between soccer and cultural material. To read this book is to begin to understand both Holland and its soccer team.…

Reactions to MLS Semifinals, Conference Final Odds, and an Update on Semifinal Model

Conference Semifinal Reactions

MLS’s annual bastardization of soccer playoffs – aka the conference semifinals – is now complete. Sure, I’m a little bitter because my team dug a hole in its first leg that it couldn’t climb out of even with an outstanding performance.

I was at that second leg this past Wednesday, and the energy was electric until the final whistle. It’s more that this league can’t seem to figure out what it really wants to be – it wants to cater to the American sports fan via a playoff format, but then in a nod to every other knockout format by utilizing two-legged semifinals while not even implementing the away-goal rule.

MLS would be better off picking one direction or the other and sticking to it.

Nonetheless, the Sounders and three other teams are out of the playoffs now, and we’re down to the final four teams fighting for a spot in MLS Cup 2011 in LA. The format is what it is, so it’s time to see how I did against it. I went 2-for-4 in my conference semifinal picks, with varying reasons for success and failure.

I got the LA and Kansas City wins correct.


In LA, I correctly bet they were too good to go down due to the six match goal differential they had to the Red Bulls. In Kansas City, I correctly bet they would hold serve on match differential and were simply too hot to not win. Clearly, their 4-0 drubbing of Colorado over two matches demonstrated that superior form.

Honestly, the Philadelphia/Houston series was a toss up from a statistical prediction standpoint. It was the closest of the four using my statistical methods, but any statistical advantage for Philly came in that their coach had less experience than Houston’s (they were even on matches played).

Luckily, this year’s results got rid of that silly “coach experience” anomaly as a statistically significant predictor (more on the adjustments to the model later). The matchup was really just a flip of a coin statistically, and perhaps I should have gone with the experience of Houston over the second-year improvement and first playoff birth for the Union.

In the Seattle/Real Salt Lake series I picked against my statistical judgement, giving in to supporter’s optimism. In the closing weeks of the regular season I told any Sounders supporter I knew that I would rather the Sounders have faced FC Dallas in the first round than Real Salt Lake.

RSL’s skid at the end of the season was a false one – one predicated upon missing personnel they were getting back by playoff time.

FC Dallas, on the other hand, was clearly a slumping team that continued to slump in the playoffs. The Sounders would have matched up far better against FC Dallas, would have likely been playing to finally get the LA monkey off their back in the Conference Final, and Real Salt Lake would have been tearing up the Eastern Conference Playoffs and be in that conference’s final right now. They’d likely have won the East, and we’d be staring at an RSL vs. Sounders/Galaxy final in several weeks.

For all the griping that would have come from a “Western Conference team winning the East”, it would have been a just end to a season that saw those three teams dominate the Western Conference and largely the entire league. Ironically, one of the few just endings from the MLS playoffs in recent memory.

Rarely do things work out as desired, and Seattle faced RSL in the conference semifinals. As a supporter, I picked against the statistics, the Sounders’ history of troubles in the playoffs (they had to end sometime, right?), and Real Salt Lake’s playoff experience.

I felt the Sounders and Galaxy would both overcome the statistics, and perhaps we’d be able to say the league had gotten to the point that its playoff format didn’t determine champions based upon who had played fewer matches in a season.

Watching the first leg from the couch of my living room, I immediately regretted the pick (side note: luckily an 8-hour exam earlier in the day and three beers throughout the match luckily made me too tired to throw anything at the television, or else I’d be out a couple grand right now due to buying a new television).

The Sounders picked the worst day of the year to play what was their worst game of the year, resulting in a 3-0 deficit for them.

The return leg was the polar opposite. It was very clear that RSL was intent on parking the bus and earning a berth in the Western Conference Finals based purely upon the three goals they scored in the first leg. The statistics in the table below, which compares the change in different statistics from games one to two for each of the clubs leading after the first leg in the 2011 conference semifinals, bear this out.

Granted, the other three teams were heading home to defend their leads, none of them was as large as Real Salt Lake’s, and none of their first leg performances had been as dominant as Real Salt Lake‘s.

RSL said all the right things going in to the second leg in Seattle, recognizing the Sounders were a dangerous team – they had won eight matches during the regular season by scoring three or more goals, six of those wins were by two or more goals, and two of them were 3-0 shutouts. Still, watching the game live, re-watching highlights, and then looking at the statistics above I can’t help but feel RSL went beyond parking the bus. Time wasting got so bad that Nick Rimando was issued a yellow card for just such an infraction.

RSL simply hunkered down and was content to boot the ball forward. The starkest contrast could be drawn with Sporting Kansas City, who went home up 2-0 and came out with attacks in the second leg that netted another 2-0 result for them. RSL was the only team of the four to move on to the second leg and have a worse performance across the board.

Nonetheless, the Sounders fell short of their attempt to come back from a three goal deficit. What will likely haunt them the entire offseason is not the misses or blocks in the second leg – there’s not much they can do about a Real Salt Lake defense that played relatively well against the 26 shots they faced. It will be the Grabavoy goal in the dying minutes of the first leg that ended up giving RSL their three goal lead going back to Seattle.

None of this is to say that RSL doesn’t deserve the win. They played outstanding, attacking football in the first leg, and combined with the Sounders horrible performance they earned their three goal lead. The shame is that they didn’t pursue the single goal in Seattle that would clearly put them through to the final, and instead played cynical, time wasting, park-the-bus soccer that helps fuel criticism of MLS’s two-legged format.

Update to the Conference Semifinal Models

With the conclusion of this year’s conference semifinals, eight new data points were added to the model that is based upon MLS playoff data from 2003 forward. Those new data points have helped to make the model a little more logical, as well as confirm one of the early trends.

On the logic front, losses by Philadelphia and New York, who had some of the shortest tenured managers in the playoffs, eliminated the odd historical anomaly of less experienced managers fairing better in the conference semifinals from the ranks of statistically significant predictors.

Replacing it in the list of significant predictors was the difference in the teams’ seeds. A plot of the effects of seed difference are shown in the graph below. Seeds are listed numerically, so top seed LA (1) playing bottom Western Conference seed New York (6) would produce a seed difference of -5 for LA and +5 for New York.

Based upon the graph and its associated equation, each unit difference in seed changes the odds of winning a two-legged playoff by 6.8%.

Despite the LA Galaxy becoming the first team to win a two-legged conference semifinal when facing a team that had played 6+ fewer games than them, the trend of teams playing more games losing their two-legged playoff continued. Two of the teams that lost – Seattle and Colorado – each played four and three games more, respectively, than their opponents. The net impact of the 2011 results is expressed via the graph below.

Astute readers who compare the exponent term in the equation to the same term from the 2003-2010 data will see that it is numerically smaller. The net effect is to lower the impact of the difference in matches being played: a 6.9% change in odds of winning the series for each unit change in game differential compared to a 7.5% change excluding the 2011 playoff data. The addition of the extra data points also tightens up the 95th percentile bounds. Data through 2010 indicated a 95th percentile range of .34 around the nominal (solid) line between game differences of -5 and +5. The increased sample size and results from the 2011 data have now tightened this range to 0.29. In statistical speak, the accuracy of the model’s nominal prediction continues to increase, while the effect of increased matches seems to be a bit lower than originally predicted.

A Brief Prediction of the Conference Finals

Going in to the conference finals, the playoffs switch back to a single match, winner-take-all format at the higher seed’s home pitch. As was shown in my earlier post on the history of MLS single-match playoffs since 2003, the only statistically significant predictor of success is the difference in the team’s two goal differentials throughout the season (including playoffs). The table below provides a comparison of the conference finalists‘ goal differentials and their odds of winning.

I’ll be sticking with the numbers. In the case of Kansas City, I think they’re simply too hot to lose this match at home. A rough start to the season on the road has been rewarded with a second half of season homestand and outstanding play to go with it. I agree with Grant Wahl when it comes to LA – their season may go down as the single greatest in MLS history if they’re able to to win the MLS Cup. The match with RSL will be close, but in the end I think they will prevail. I just think LA is too good to not win at home in the conference finals, and then win again at home two weeks later to hoist MLS Cup 2011.…