Saturday, April 30, 2011

Friday Night Links

I must start off this week's links with the video below, which has been circulating for some time now but came to my attention via Paul Tomkins.  What awesome timing given the acting we saw in this week's rendition of El Calssico.



Now on to the readable content

New Books and Blogs

I have picked up The Fix: Soccer and Organized Crime by Declan Hill in preparation for business travel to Europe this coming week.  I figured two 10-hour plane rides in six days will afford a lot of time to read about a topic I've recently been diving in to.  The books been on my length Amazon wish list for some time, and I figured now was as good as any time to take the plunge.  It's 408 pages long, so I suspect I won't complete it on the trip with the other responsibilities I will have.  It's highly regarded in the soccer community, so I am curious as to what thoughts you readers may have.  Just don't spoil any surprises for me.

Links of the Week

As I said, I am on my way to Europe this weekend.  I will miss my Sounders as I will be somewhere over the Atlantic Ocean when they take on Toronto FC at Qwest Field.  I am hoping to find an pub in The Netherlands that is broadcasting the Manchester United/Arsenal match.  I will be watching more for the experience, as any honest Gunner will admit we're underdogs in that match given our recent form.  Oh, to be so close to London yet so far away from being able to attend the match!

Thursday, April 28, 2011

Quantifying the Bias of Arsenal’s Referees vs. The Rest of the League’s Clubs

This post can also be found at Untold Arsenal.

Author’s Note: Special thanks to DogFace for his co-authorship on this post.  His voluminous data set, unending patience, invaluable insight and contribution, and constant editorial feedback throughout the creation of this article was invaluable. He’s a wonderful blogging partner with whom any Gooner or statistician would be lucky to work.

In my first post in this series I used DogFace’s match data to explain how Phil Dowd is the least desirable referee for Arsenal as he not only shows the most biased officiating in terms of fouls, yellow cards, and red cards, but he also shows the largest effect on Arsenal’s likelihood of winning a match. That analysis focused on the effects of all of Arsenal’s referees, but did not quantify how those referees officiated other teams’ matches. This post contains such an analysis, and the results are very interesting.

To aid in such an analysis, a binary logistic regression (BLR) model was created for each team’s likelihood of winning a match based upon a number of factors. Each BLR model includes terms that capture the effects of venue (home/away) and differentials of shots, shots-on-goal, corners, fouls, and fantasy points for yellow and red cards. Not every term was significant for each team – terms that had a p-value of 0.10 or less were eliminated from the team’s BLR and their coefficient for that term is set to zero. I know that may upset some stats geeks who would prefer p-values of <= 0.05, but the reduced sample size requires a lower p-value threshold or very few teams would have any significant terms. The resultant BLR’s allow a construction of each team’s odds of winning each match, and a study was constructed as to how each official impacts the odds of winning each match based upon their officiating versus the expected average officiating from the club’s total number of matches.

Background on the Data Used in the Analysis

Before diving in to any analysis, a few statements on the types of data used are in order.

The intuitive way to model bias in a referee would be from the fouls per booking and bookings per match figures – although DogFace has often found these to be counter intuitive in that, if a bias exists, it can be expressed within the game in different ways – these often depend on the styles of play of the teams involved.

The noise in these figures is further increased by the fact that the official statistics only reflect what the referee deemed to be a foul rather than the reality of a foul (according to the laws) and/or the consistency of the referee’s interpretation thereof. “Sins of omission”, where a referee should have called a foul but for some reason or another did not, are tough to quantify in an analysis that uses such statistics.

To combat this effect, Untold Arsenal utilises a professional referee (Walter Broeckx) to analyse matches and it is clear that what is recorded in the official statistics is often way off the mark. DogFace’s own data sets confirm this effect. The rise in popularity of statistical analysis in football has led him to notice a trend for parity in the fouls per booking figures that suggests that the referee is aware of his statistics in game – it could be said that this ‘trial by media’ has created the bias we see in these figures and that a referee himself has a conflict of interest in every call he makes or indeed does not make.

The ideal situation would be an independent body to record/analyse referee performance and provide an open source database for us to comb through though - this data would include the most important data of all i.e. the standard ‘human’ error. Alas, we do not have such research based on an independent body, and instead you will have to make do with the work of humble bloggers like Walter Broeckx, DogFace and me.

Quantifying How Arsenal’s Referees Officiate the Rest of the League

Just like the last post, each team’s average foul and fantasy point total was calculated. These variables represent the match attributes directly controlled by the referee. Each team’s actual odds of winning each match were compared to the odds that would be realized if the referee had given the team’s average foul or fantasy point differentials (different values were used based upon the team’s averages for home and away matches). This allows a calculation of an odds differential for each match.

Matches were then grouped by referee to allow an analysis of each referee’s overall bias vs. the average odds. In this case, the same referees as the first post were examined – Atkinson, Bennett, Dean, Dowd, Foy, Halsey, Webb, and Wiley – as they had officiated the greatest number of Arsenal matches and in general a large number of matches overall. Most importantly, they have officiated a range of home and away matches so the impact of home/away bias is minimized. Unfortunately, the limitations of the number of matches in a season and the availability of each of the referees prevents an analysis where each referee officiates an equal number of home and away matches, but the statistics used in the analysis take the effect of venue into account to minimize its effect on match outcome. The data used for this analysis was also isolated to the 2006/07 through 2009/10 seasons to ensure an adequate number of samples were available from each referee.

It should be noted that for the first part of the analysis, referee data from all four seasons was grouped together. This allows us to look at gross bias throughout the seasons, and keeps a pretty high sample size for the analysis. Later in the post key referees' data is broken out across seasons to show the effects of referee by season. Such a study by season helps us see any bias that may be dependent on club’s objectives (Premier League title, European competition qualification, avoiding relegation) and current quality of play (above, at, or below form). Inevitably, such seasonal analysis suffers from limited sample size and is saved for those referees whose overall bias is the most noteworthy.

One further adjustment to the data set had to be made. When creating BLR’s for each team, several of them showed no impact to their odds of winning a match due to foul or fantasy point differentials. That is to say that none of the two coefficients for those terms in a team’s BLR were statistically significant – the coefficient’s p-value was greater than 0.10. This means that we can’t evaluate the impact of officiating on those teams’ likelihoods of winning a match. Thus, teams that did not have a significant BLR term from the two associated with officiating bias – fouls or fantasy points – were eliminated from the analysis as they’d provide a zero impact and generate misleading results.

A general linear model (GLM) using referee and team officiated (Arsenal vs. not Arsenal) was created from the reduced data set to observe the resultant interactions. The results of the analysis are represented in the graph below which shows each referee’s average odds differential for Arsenal and the rest of the teams. The two plots in the graph show the same data in two different manners. The key plot is the one in the lower left (click to enlarge).


Before diving in to the plot, readers with a keen eye will note that the red line, which represents Arsenal’s average odds differential by referee, is a bit different from a similar line plotted in my previous post. This is due to the two different ways the data is analyzed. In the previous case, I was looking at the referee’s performance as a function of just Arsenal’s matches by year. That led to the grouping and averaging to be a bit different than this analysis, which simply looked at Arsenal versus the rest of the league regardless of season within the four analyzed. This latest analysis essentially ignores the effects of time.

Looking at the lower left hand corner of the plot, it’s clear that Dowd shows the largest gap between how he officiates the average match and how he calls an Arsenal match - about a 1.5% penalty for Arsenal. However, what’s different about this analysis is that it also clearly shows that Howard Webb is pretty biased too – also nearly a 1.5% penalty for Arsenal. Rounding out the top three are Wiley, Halsey, and Bennett each with a nearly 1% penalty for Arsenal. Atkinson and Foy seem to show the smallest gaps.

Another way to look at in-match bias is to look for odd patterns in fouls-per-booking. A plot of four factors – Referee, Arsenal/Not Arsenal, Season, and Home/Away – and their impact on fouls-per-booking is shown below. Values on the left or right side of the plot represent the fouls per booking for the plots across that row. Values above and below the graph represent different levels of each factor expressed as text in one of the boxes in each column. The legends to the right of the graph explain what each color/shape combination represents in each row. Thus, if one wants to understand how each referee’s average fouls per booking for Arsenal compares to their fouls per booking for the rest of the league, one can look to the square in the 2nd row/1st column or the 1st row/2nd column. Each of those two plots shows the same data, but each is plotted in a different manner.  Click on the image below to enlarge it.


Looking at the plot of data in the 2nd row/1st column, one sees there is only one referee that has a higher fouls per booking for Arsenal than the rest of the league – Steve Bennett. Yet again, Howard Webb and Phil Dowd lead the pack with their differential of fouls per booking for Arsenal versus the greater numbers of fouls-per-booking they allow for the rest of the league. Surprisingly, the favorable Chris Foy also has a large gap between how he calls Arsenal matches vs. the rest of the league.

What’s especially disturbing is the fact that Dowd ends up with such a bias against Arsenal in fouls-per-booking given his disproportionate number of home matches he’s officiated for Arsenal (7 home matches to 2 away matches). Take a look at the plot in the lower left, which shows home (red) and away (black) fouls-per-booking by referee. Overall, Phil Dowd shows one of the largest gaps in favor of a higher fouls-per-booking average home versus away – nearly two fouls per booking. Yet such an advantage is not showing up in Arsenal’s numbers when Dowd is officiating their home matches. In fact, when looking for statistically significant factors in the GLM of fouls-per-booking vs. referee, club, season, and venue, the interaction of referee and venue is the only statistically significant term! Some referees are being consistent between home and away matches, while others are showing large gaps.

Comparison with Pre-Match Expectations via the Asian Handicap Swing

Does this comport with pre-match expectations? One loose measure of such expectations is the Asian Handicap assigned to a match. Luckily, DogFace has also been recording this statistic for each match. DogFace has also introduced the concept of the Asian Handicap Swing (AH swing) to readers of Untold Arsenal. The AH Swing is computed via the following equation:

AH Swing = Actual Goal Differential – Asian Handicap

The AH Swing represents the performance or deviation against the handicap in actual goal difference. The betting line data (or Asian Handicap) is an average calculated from around 30-50 bookmakers across Europe and Asia. We assume the average bookie handicap does not use corruption/bias as a significant factor in the markets – not only for the reasons stated below but also because referee driven match fixing would only affect the markets significantly in terms of an ‘upset’ against the odds.

If we were to take the stance that the betting line reflects referee corruption/bias then any deviation from that line in terms of an over/under performance would be understated i.e. we could say that any ‘noise’ in the original handicap from the bookies would actually understate the bias we are attempting to model. Luckily previous research does not indicate this case, but rather indicates that on the whole bookmakers may be creating a market that is essentially based on efficiency and competition. At the least, they are essentially hedging a gestalt metaphysical abstraction based on media disinformation and the credulous belief that “it all evens out at the end of the day”. However it is worth considering that, as perception of bias in referee performance becomes more ‘main stream’ and paradigms shift, we will see this effect in reflected in the markets of the future.

This post will instead focus on the reality of the results as it is a far more revealing stance to take. The Asian Handicap line is one that reflects, more or less, the illusion of an uncorrupted market. The swing from that line allows for an examination that a corrupted, or biased, market exists.

Just like odds differential, the AH swing has been calculated for each and every match in the database. A plot of four factors – Referee, Arsenal/Not Arsenal, Season, and Home/Away is shown below. Values on the left or right side of the plot represent the Asian Swing for the plots across that row; values above and below the graph represent different levels of each factor expressed as text in one of the boxes in each column. The legends to the right of the graph explain what each colour/shape combination represents in each row. Thus, if one wants to understand how each referee’s average AH swing for Arsenal compares to their AH swing for the rest of the league, one can look to the square in the 2nd row/1st column or the 1st row/2nd column. Each of those two plots shows the same data, but plotted in a different manner.  Click on the image to enlarge.


Looking at the plot in the 2nd row/1st column, it is shown that Mike Dean, Howard Webb, and Phil Dowd have the lowest average values for Arsenal’s AH Swing. This means they are consistently officiating Arsenal’s matches tighter than the other referee’s. Unlike the analysis of fouls-per-booking, the interaction between referee and venue is not statistically significant. Referees consistently provide a significant AH swing advantage to home teams compared to away teams.

It has now become clear that on multiple fronts that Dean, Dowd, and Webb appear to be the most biased against Arsenal. To get an even better understanding we must look at how each of the three officials impact each team and compare those matches to how bookies might expect them to turn out.

Quantifying How Arsenal’s Referees Officiate the Rest of the League

One way to visualize whether or not Arsenal is the most penalized when it comes to the officiating of Dean, Dowd, and Webb is to look at how each team’s average odds differential compares to their average Asian handicap swing when these three officials are present. The three graphs below represent just such a comparison for each referee. The x-axis represents the average Asian handicap swing. The y-axis represents the average odds differential.

The key to the graph is the lower left hand quadrant. Teams that end up there, especially those that end up further away from both of the lines along an imaginary diagonal line extending from the origin of the graph, are likely experiencing a higher amount of bias in officiating than the other teams the referee has officiated. The three graphs below show such plots for each team against each of the three referees. A summary of conclusions is found after the third graph.  Click on any of the graphs to enlarge.




Here are some conclusions that can be drawn from the three graphs:
  • There is clearly a band around the ± 2% region of the odds differential where most teams cluster.
  • It’s also clear that the top teams – Arsenal, Chelsea, Liverpool, and Manchester United – tend to do better on the AH swing than other teams. Indeed, teams that would generally have a more positive Asian Handicap tend to be to the positive on the swing.
  • While Arsenal do seem to be experiencing some bias at the hands of the three referees, they don’t seem to be the worst off. Liverpool clearly pays a bigger penalty under these three referees.
  • Of Arsenal’s main competition for league trophies the last half decade, Manchester United and Chelsea get a much better shake from the referees. Each has a vastly superior AH swing, while each of the two teams finishes much better in terms of odd differential than Arsenal for two out of the three referees. These advantages translate to an average of a 0.73 goal benefit in AH swing and a 1.6% benefit in odds of winning a match.
  • Chelsea’s average odds differential is 1.6% better than Arsenal’s, while their average AH swing advantage is 0.78 goals.
  • Manchester United’s average odds differential is 1.5% better than Arsenal’s, while their average AH swing advantage is 0.68 goals.
  • Mike Dean is the only referee of the three to put both Chelsea and Manchester United to the positive. He’s also the referee demonstrating the highest bias against Arsenal in AH swing, both nominally and when compared to Chelsea and Manchester United (a 1.52 swing deficit to both).
  • Perhaps Ryan Babel’s tweet with Howard Webb wearing a Manchester United jersey wasn’t too far off. It Manchester United is his most favored team by a mile both in terms of swing and odd differential.
  • Of any referee who has officiated the Big Six, Mike Dean gives the most favorable treatment to Chelsea. Their average odds of winning a match are improved by nearly 1% when he officiates one of their matches, and they experience their best AH swing under him.
  • Of Arsenal’s North London rivals, Spurs are treated about even by Dean and worse by Dowd and Webb.
There are some interesting interactions that go on between referees and managers as well. One could be referred to the Dean-Redknapp effect. When looking at matches where Dean is the official in matches involving Portsmouth and Tottenham, both teams do better in terms of AH swing with Redknapp at the helm than when he is not, bordering on statistically significant effects (odds differential was a wash). Interestingly enough, the interaction of manager and team is not significant, so Redknapp’s record under Dean is not unduly boosted or penalized by his record from either of the clubs. It stands on its own. Such a detailed analysis would have to be the focus of a subsequent post, and deserves a wider treatment of referees, to ensure that it’s not simply an effect of Redknapp’s superior coaching. Nonetheless, it provides intrigue when studying the effects of officiating.

A Comparison of The Big Six By Season

It has been shown that over the 2006/07 through 2009/10 seasons that Dean, Dowd, and Webb have given Arsenal a bit of the short end of the stick when it comes to fouls-per-booking, AH swing, and odds differential. One last bit of examination remains – what happens if we open up the examination to the entire six years of data in DogFace’s database, isolate for the Big Six for these three referees, and examine how a few of the trends may be changing over time?

The graph below provides a plot of such data. The solid lines represent the teams’ average AH swings under the three referees, the value of which can be found on the right hand side of the graph. The dashed lines represent the team’s average odds differentials under each of the referees, the value of which can be found on the left hand side of the graph (click to enlarge).


A few general trends can be observed:
  • All teams except Manchester City have been on a general downward trend over time in terms of AH swing.
  • Manchester City also is also the only team to demonstrate a steadily increasing odds differential over time.
  • Arsenal started out with the third lowest swing in 2005/2006, and their decline has been consistent to the point of falling below 0 this season with the three referees. They are the only team of the Big Six to experience a negative swing.
  • The dashed set of lines shows Chelsea and Manchester United hanging around neutral (i.e. 0%) odds differential over time.
  • Again, Arsenal is on a steady downward trend to the point that their average odds differential under the three referees is on track to be greater than -2% this season. Only Tottenham has a worse odds differential.
So who’s driving this downward trend for Arsenal? The plot below shows the same data as the plot above, but it eliminates the other three teams and instead focuses on Arsenal’s referees (click to enlarge).


On the AH swing front, there’s been a steady erosion the last year. However, before that Mike Dean’s officiating showed a distinct downward path compared to the rest of the referees. His AH swing went negative in the 2009/10 season, and Webb has joined him now in 2010/11. It appears it’s a case of Dean pulling the rest of the average down with him, and the other two referees joining him this season.

As for odds differential, it’s Dean again that leads the pack. He started out as a neutral referee in 2005/2006, but then has steadily eroded that neutrality to a -2% differential by last season. Dean’s low average and the recent degradation in Phil Dowd’s officiating are what are leading to Arsenal’s precipitous drop in 2010/11.

Conclusions

Whether it’s actually poor form generating a higher number of fouls and cards or actual referee bias, it is clear that Arsenal pay a bigger referee penalty than all of the teams they’re competing against for the Premier League championship save for Liverpool. This might be combated by having the three highlighted referees – Dean, Dowd, and Webb – officiate fewer Arsenal matches. They officiated an average of ten matches, or 26% of Arsenal’s season, each of the last four years. Statistics would suggest that having a fewer number of referees officiate a greater number of matches for each squad would lessen the chance of a poorly officiated match impacting a team’s season point total. Such an assumption is based upon the idea that the official’s errors or bias are randomly distributed. The data above suggests otherwise.

Monday, April 25, 2011

Steve Zakuani, the Heart and Soul of the Sounders

Happier and healthier days for Steve Zakuani

Being an Arsenal fan, I am a bit sensitive to broken legs on players I love.  I haven't been watching the Gunners long enough to see the laundry list, but I did get to watch Ryan Shawcross's brutal tackle on Aaron Ramsey.  We're now more than a year on from this assault, and Ryan "Not that kind of guy" Shawcross has earned a spot of David Hirshey's Worst XI in the EPL while Ramsey has yet to get back in form for Arsenal's first team.

I got a brutal reminder of what such nasty tackles look like on Friday night when the Sounders' very own Steve Zakuani had his tibia and fibula snapped in half (warning: link shows the brutal carnage, including some pretty gruesome slow-mo of the leg post hit).  And here's the weird Arsenal six degrees of separation involved in this incident - Zakuani, a product of the Arsenal Academy, was taken out by Brian Mullan of the Stan Kroenke-owned Rapids.  Yes, that Stan Kroenke.  There's also the inevitable "he's not that kind of guy" commentary we Arsenal fans are used to hearing - perhaps it's something we Sounders fans should get used to.  It's nice to see that Brian Mullan is doing everything possible to demonstrate he's not that kind of guy (emphasis mine, although I am not the first to make the observation):
"It was never my intention to injure him in the least. It's a tackle that I've done hundreds of times and would probably do again. I had no intention of hurting him. It's a freak, freak thing, and I apologize and wish Steve a speedy recovery."
Sounder at Heart has some advice for Mullan, and he should heed it.  It's not like he didn't exhibit guilt in the video replay, where you can clearly see Mullan size up Zakuani and take out his frustrations from an earlier non call on his leg. With both feet. Studs up. But he's not that kind of guy.

Enough with the dirty tackle stuff.  This blog is about statistics, and here are Steve's:
  • 2009 (rookie season): 4 goals and 4 assists, making him a finalist for rookie of the year.
  • 2010: 10 goals and 6 assists, which allowed him to share the team's golden boot with Fredy Montero.  Half of those scoring contributions came in the critical back half of the season, where he and Montero led the Sounders back from the near bottom of the table to the fourth playoff seed in the West.
  • 2011: In six games he had already scored two goals and two assists.
The most important statistic is this one: Seattle's first ever draft pick and #1 overall pick when Seattle entered the league in 2009.  He wasn't the first Sounder - guys like Roger Levesque who were holdovers from the USL side and new players like Kasey Keller were signed well before the first season.  But those guys don't represent the sacrosanct "first pick" in American sports.  As in most other US sports, MLS allocates their college and non-academy players via a draft.  These players represent the eternal hope of team getting lucky and finding a franchise player that plays for them at an early age and from day one.  Along with Montero, Steve represents the long term future upon which the club is built.

Beyond his on-pitch performance, Steve also served as spark plug elsewhere in the team.  Steve's one of the most loved players on the Sounders. He has a huge heart for the team, the city, and the supporters.  He's a humanitarian with few equals.  He's always sporting a wonderful smile no matter the occasion and leads the response to any huge plays by he and his teammates.  His face is so central to the team that a picture of him in post-US Open Cup celebration was all that was used for the banner ad for season tickets this off-season.  Talk all you want about DP's on the Sounders (we've had our fare share of them), but this is the guy who makes the offense tick statistically and emotionally.  His absence will be felt by everyone on the pitch and in the stands.

Realistically, Steve's season is over and his career may never be the same.  We all hope that he works his way back in to the same form as he was in to start this season, but the realistic part of me accepts that he may not.  Up until Friday night this had been an over two year journey for him and the team to get into the form he's demonstrated.  It's been more than his individual talents, and has relied a good bit on rhythm and experience with other teammates he's been around for more than two seasons now.  He was hitting stride this season, one I've highlighted as do-or-die for this crop of Sounders in a previous post.  Who knows what teammates he'll come back to next season.

Regardless of his ability to come back and play, Steve deserves our love and support right now. Hopefully he'll be able to draw strength from the team and supporters who are already planning tributes to him at the next home match, and maybe a last little bit of inspiration from Charlie Davies who's faced a similar horrific injury and has come back from it to lead the golden boot competition early in this season.  Sounder At Heart has the priorities straight, ones we should all keep in mind.  Get well Steve - we can't wait for you to be trending on Twitter again for a reason other than an injury!

Friday, April 22, 2011

Friday Night Links

No new blogs or books this week, so let's get into my favorite links of the week:

Enjoy wherever the weekend takes you, and I will see you back here on Monday!

Wednesday, April 20, 2011

Why Shot Differential Actually Hurts a Club's Chances of Winning a Match

Over several posts about Arsenal's odds of winning matches I have highlighted the fact that a club's increase in shot differential versus the opposition actually lowers that club's odds of winning the match.  This may seem a bit counterintuitive, but it actually makes sense upon a deeper dive into the data behind the binary logistic regression (BLR) model.  This post examines those details and provides the explanation.

The Effect of Shot Differential on Match Odds

Recall that the BLR model I have constructed for the wider league and individual teams consists of the following inputs:
  • A constant term
  • Venue (home/away)
  • Shot differential
  • Shots-on-goal differential
  • Foul differential
  • Red and yellow card fantasy point differential
It turns out that for the league, all terms in the BLR are statistically significant to a p-value of 0.05 except for foul differential.  When individual teams are examined the terms which are statistically significant vary by team, with statistical significance given to any term with a p-value < 0.10 to ensure a reasonable number of terms are included in each team's model.

To understand how shot differential impacts the average team's odds of winning a home or away match, the BLR values for shots-on-goal and fantasy point differentials were set to their averages by venue while the shots differential was varied from the minimum to the maximum value by venue.  The resultant plot of data is shown below (click on graph to enlarge it).


The graph shows the decreasing nature of the odds of winning as shot differential goes up both home and away.  The slope terms of each regression equation indicates that home teams pay a bigger penalty for increasing shots - only three additional shots are required to lower the odds of winning by 2%, while it takes four shots away to have the same effect.  The fact that the dashed lines, which represent the bounds of the 95% prediction intervals for the two venues, do not cross indicates the difference in odds between in home and away is statistically significant.  The average home team performance for a given shot differential is more likely to win than a similar shot differential for the average away team performance.  Similar relationships were seen for every team that had a statistically significant shots coefficient for their BLR model.

Some Binary Logistic Regression Nerdery

Before explaining why increased shot differential impacts the odds of winning a match negatively, a little digression into the basic theory of binary logistic regression is in order.  A BLR model based upon the data I have from DogFace is of the following form:


where:
p = odds of winning a match
C = constant
α = BLR coefficient associated with venue (home vs. away)
A = venue (1 = home, 0 = away)
β = BLR coefficient associated with shot differential
B = shot differential
δ = BLR coefficient associated with shot-on-goal differential
D = shot-on-goal differential
ε = BLR coefficient associated with corner differential
E = corner differential
φ = BLR coefficient associated with foul differential
F = foul differential
γ = BLR coefficient associated with fantasy point differential
G = fantasy point differential (red cards = 6 points, yellow cards = 3 points)

If the equation is re-arranged to eliminate the natural logarithm term on the left hand side of the equation, the relationship between p and the rest of the terms in the model becomes:


If the chain of exponentials on the left hand side of the equation is represented by the term x, and then the equation is re-arranged to isolate for p, the model becomes:


Thus, p increases as x increases (by the chain of exponentials increasing).  The inverse is also true - a lower x value produces a lower p value.

Isolating for the two terms involving shots means examining the impacts of β, δ, B, and D.  If (β x B) > (δ x D), the net effect will be a shots metric contribution to x that is less than one, and thus lowers p.  If (β x B) < (δ x D), the net effect will be a shots metric contribution to x that is greater than on, and thus raises p.

Not All Shots Are Created Equal

How is it that increasing shot differential ends up lowering a club's odds of winning?  It is because all shots are not created equal, and that it is actually shots-on-goal differential that increases a club's odds of winning.

The β term in the BLR must be negative for a team's odds to decrease with increasing shot differential as observed in the first graph in this post.  Indeed, this is the case for the league and club models for which shot differential is statistically significant.  The key is that it must be offset by shots that are on goal.  It turns out that the δ term for the league and all teams for which it is statistically significant is positive.

Not only is the shots-on-goal term positive, its magnitude is much larger than the negative.  Thus, the impact of shots-on-goal is greater than that of shots.  One sign of a team's strength is the ratio of δ to β - a higher δ to β ratio means that they get a larger relative benefit from shots-on-goal relative to shots.

A table of such ratios for each team that has both statistically significant β and δ terms is shown below, as well as the league's ratio.  The table is arranged in descending order of magnitude (the negative values reflect the fact that the shots-on-goal coefficient is positive while the shots coefficient is negative).


The Impact of the Ratio of Shot Differential to Shots-on-Goal Differential on Match Odds

The plots below demonstrate the effects of the ratio of shot differential to shots-on-goal differential on match odds for the Big Four.  The first graph shows the impact on odds for a home match, while the second graph shows the impact for an away match.  Shots-on-goal differential is held constant at the average for each team by venue, while the shots differential is swept by a multiple of the average.  All other match statistics - differential for corners, fouls, and fantasy points - were also set to the average for each team by venue.



A few general conclusions can be drawn from the graphs:
  • Liverpool and Manchester United experienced some of the highest negative ratios, where their odds of winning a match approach 100%.
  • Arsenal's superior ratio shows up in the shallower slope of their line in both home and away matches, although they have a lower probability of winning a match at their average form compared to the other three clubs.
  • Match odds are far more linear away than at home.
  • Chelsea seems to be the most robust to positive ratios given their higher overall odds at ratios greater than 3:1, which is where clubs begin to wipe out any advantage from shots-on-goal.
  • Manchester United and Liverpool have a very similar odds at home, while Manchester United has far superior odds away (~10% better odds regardless of ratio).
So, not all shots are created equal.  Every additional shot-on-goal that a club realizes versus the competition helps raise their odds of winning.  Indiscriminately taking a shot just to do so does nothing, presents a wasted opportunity as it never really had a chance at going in the goal, and statistically it actually lowers the team's odds of winning the match.  The key is getting the highest shots-on-goal differential to shots differential ratio as possible to gain the highest chances of winning the match.

Sunday, April 17, 2011

Arsene Can Only Blame Himself, But He Can Fix It


"Ah, piss off!"

There was a time on Sunday when "King Kenny" and #PissOffWenger were trending on Twitter, demonstrating the reach of the global soccer community and the power of the Kop. Surely there are a lot of Wenger haters not aligned with Liverpool that jumped on the trend once it got going, but the phrase is symbolic of where the two clubs and their managers may be at this late point of the season. All the more fitting because of the fact that they started the season, literally, each at a different place. What a difference eight months makes.

The Match Statistics

Before diving into some thoughts on what ails my Gunners, I'll start with a statistical breakdown of each match - the first of the season at Anfield and this latest one at the Emirates. It provides some insight into what each side should have expected out of both matches, and provides some grounding in my later criticism of Wenger.

Just like this post, the tables below represent the odds of winning a match based upon a binary logistic regression (BLR) that looks at venue and differentials of shots, shots-on-goal, corners, fouls, and fantasy points for yellow and red cards. I've generated summaries of each match's statistics and the resultant odds based upon those statistics. Click on either table to enlarge it.




It wasn't just your eyes that told you Liverpool should have won the match at Anfield - the match statistics bear this out as well. Liverpool had the statistical advantage in every category - shots-on-goal, fouls, and fantasy points were in their favor, while total shots and corners were not (note: it turns out that teams want to have fewer corners and shots versus the opposition, a concept I will explain in a later post). Reina's muffing of Chamakh's shot truly was a gift of a point for the Gunners. Liverpool should have come away with a full three points that day.

Fast forward eight months, and the match at the Emirates was a little more even but still in favor of the Reds again. Arsene Wenger highlighted his team's "difficulty at creating chances" at his post-match news conference. Indeed! Even though Arsenal took 60% more shots than Liverpool, the Gunners had no advantage in shots-on-goal and had 7 more corners than the Reds - all adding up to a huge knock on the Gunners' chances of winning the match. All of this results in dropping Arsenal's odds of winning the match, which are only slightly raised by the two fewer yellow cards. Liverpool's oddity of being only one of three teams in the Premier League from 2005/06 through 2009/10 with a statistically insignificant constant term in their BLR model means that all of those statistical advantages added up to a 69% chance of winning the match, while Arsenal had a 34% chance of winning with such form. The odds were much closer than the first match, and perhaps a draw was warranted based on the statistics and what we all saw with our own eyes.

The bigger problem for Arsenal in the second match was their inability, yet again, to close out a match and take advantage of a lucky break. Statistically they may have been at a disadvantage for a win, but fortune broke their way late and they were a few minutes away from keeping pace with Manchester United. Instead, they walked away with what was a disappointing draw. Such collapses that snatch a loss or a draw from the jaws of victory has been all too common this season.

The Common Denominator - The Defense

What became immediately clear upon replay of the foul that enabled Dirk Kuyt's tying goal was the softness with which Lucas went down. Good for him, as drawing a foul is part of the game. What's more frustrating is how Emanuel Eboue put himself in position to have a foul called. This is symptomatic of wider defensive woes this season that have cost the Gunners a total of what should have been nine additional easy points that would have them top-of-the table right now, as well as the Carling Cup. Recall the following:
  • Way back in September Arsenal drew Sunderland 1-1 at the Stadium of Light after leading for nearly 80 minutes. The stoppage time equalizer from Darren Bent came literally on the last play of the match, which was set up by a poor clearance from Gael Clichy.
  • In November the Gunners conceded a two goal lead in the final 40 minutes of the match. I recounted here how rare an event that was. It was Tottenham's first win at Arsenal in 17 years.
  • It was Abou Diaby providing the exact response for which Newcastle United were looking when they implemented their second half kick-them-off-the-pitch strategy while down by four goals in February. Diaby's sending off began Newcastle's comeback from four goals down, which was a first in the history of the Premier League and perhaps the best example of how Arsenal's defense can both make boneheaded mistakes and crumble under pressure at the same time.
  • In the Carling Cup final, it was defensive miscommunication between Laurent Koscielny and Wojciech Szczesny that led the heavily favored Gunners to lose to Birmingham City. This loss, combined with the loss against Newcastle earlier in the month, set up an awful run in March and now April that has seen the Gunners fail to capitalize on any Manchester United stumble and now find them six points adrift of the Premiership title and only two points ahead of Chelsea.
In some cases, it's been a lack of killer instinct (TottenhamSunderland, Birmingham, and Liverpool). Either way, it's clear that the defense has major holes when it comes to protecting leads. I agree with Tim at 7AM Kickoff - I think the goal keeper problem is largely solved via Szczesny. What we Gooners want to see is better play from the defense in front of him, and we want to see it next season.

The Transfer Policy Needs to Be Adapted, Arsene

I've written here and here about the impacts the squad and starting XI transfer cost can have on a team's likely finish position. I've also heaped praise on Wenger for grossly outperforming such a model - averaging a third place finish over the last six seasons when his transfer expenditures suggest he should finish 10th. What's also become clear over that time frame is that consistently finishing third isn't meeting the supporters' expectations.

The Swiss Ramble made an excellent post in September 2010 that provides an great summary to Arsenal's financial quandary. Their strategy of not taking on debt to finance transfers has generated tidy annual profits in the transfer market, which helps compensate for their lower commercial income when compared to Chelsea, Liverpool, and Manchester United. They also have match day income that is the envy of Chelsea and Liverpool, and surpassed only by Manchester United. However, they do have the third highest payroll in the Premier League, outstrip their next closest competitor (Liverpool) by £21M, and have a wage-to-turnover ratio (50%) the envy of everyone else but Manchester United. They may not be paying for transfers in the open market, but they are paying their players wages that are in line with winning championships. And that's the rub.

It's time for Arsene and the Arsenal board to admit the plan has failed. I know that they don't set out to finish third every year, have steadily earlier exits each year in Champions League, and go trophyless in FA Cup and Carling Cup. They set out to win, and they've tried to win in an anti-Chelsea manner. They've had a squad, transfer-wise, that barely maintains the league average transfer cost yet competes with the big boys every year. It also consistently fails to win a trophy. Arsenal has made some good deals (Walcott, Szczesny, Cesc, etc.) along the way, and have a few great academy players in the first team as well (Wilshere and Djourou being the brightest). Save for Djourou, none of those great deals and academy players starts between the goal and the midfield. The Gunners simply need a few more top quality defenders, and they'll have to pay to get them.

Which brings me back to the start of my post and #PissOffWenger. The man elicits such a response partly because he has been around so long, partly because he has been so successful in the past, partly because he's not English, partly because he's been in everyone else's face trying to explain that he's building a team differently, and partly because lately he's been a bit of an insufferable whiner. He may have a few legitimate gripes, but the sheer number of gaffes this season suggests there's more than bad officiating or a league conspiracy at work here. The public manner in which he has vented his frustrations during and after matches makes him seem like a sore loser, and plays right into the role within which the press want him to fit. If you're a defender looking to join a club fighting for the Premiership title, and you perceive that the manager is a bit of a complainer who seems to not be totally in touch with reality, do you take a pass on such a club? If you're Cesc Fabregas and looking at your manager's behavior and comparing it to the fun your Spanish national team squadmates are having playing for Barcelona, is their even a comparison? There's a time to get angry and harass the officials and opposite manager when officiating is biased or play is dirty. But when it seems to become your MO after each loss in the eyes of the public, you have to stop and wonder if that's the best way to build a positive atmosphere at a club expecting to compete for trophies each year.

Wenger is clearly a managerial and financial genius. Of that there is no doubt. Perhaps he needs to hear more people say it before he feels his approach vindicated. Whatever he feels, he needs to move on. He doesn't need to bust the bank and become a £300M transfer cost behemoth like Chelsea. The future of Financial Fair Play will prevent that. What he does need to do is stop having such an aversion to spending ANY money in the transfer market, and look to upgrade the defense this offseason. Finishing third with the cheapest squad to ever do so is no longer good enough. Only championships matter now.  This Arsenal team is his team - he owns it lock, stock, and barrel. It will be interesting to see if the old man can adapt his ways enough to get one more championship before he enjoys a well deserved retirement. As Tim at 7AM Kickoff said back in February,
"Wenger made this mess, and only Wenger can sort it out."

Friday, April 15, 2011

Friday Night Links

It's been a bit of a slow week for me in posts, but that's because I am working on a massive ten-page study with DogFace that should hopefully come out next week.  Trust me... it's worth the wait.

New Blogs and Books

A few week's back my blog and Twitter account got picked up by another Twitter account, which led to more than 100 new followers in a day.  I started checking into who was behind such a fortitous event, and it turns out it was a site called Euro Club Index (new to me, but not new overall).  After checking the site out and conversing with one of the company's managers, I can say I am a big fan of what they're doing.

These guys are after my own heart - a real time database of over 700 soccer clubs that rolls up into an overall index that is updated on a per-match basis.  A club's index score not only shows where they rank in the world of European soccer, but such scores are then used to predict match-by-match outcomes.  Euro Club Index is the product of the Dutch Infostrada Sports, and is reputable enought to be used by sites like Yahoo Eurosport UK.

While I haven't exactly found a way to get an RSS feed of their "Latest News" section, you can get the next best thing by following their Twitter feed which I linked above.  I heighly recommend checking them out, and hope they and I can bring you some unique analysis in the future.

Links of the Week

This weekend's match against Liverpool reminds me how fast this season has flown by.  It was nearly eight months ago when I was in the George and Dragon watching the kick off to the season with a packed house that included the friend that got me into soccer in the first place.  It reminds me to enjoy the final seven weeks of this season.  I hope you have a weekend filled with fun. I'll see you on Monday.

Monday, April 11, 2011

The Impact of Red and Yellow Cards on the Big Six

I am in the process of building a very large post that lays out how Arsenal is treated by officials versus other teams in the league.  To build such a case, I need to be able to show the impact of various match elements on the likelihood of a team winning the match.  The tool I am using to build such a case is a binary logistic regression (BLR) that looks at venue and differential of shots, shots-on-goal, corners, fouls, and fantasy points based upon yellow and red cards.

I am still working on the final analysis, but in the meantime I have analyzed the impact of red and yellow card differentials on each of the Big Six's chances of winning a match.  The plots below show the changing odds of winning a match as a function of red and yellow card fantasy point differential based upon data from the 2005/06 though 2009/10 seasons.  The first graph shows a plot of the changing odds at home, while the second graph shows a plot of the changing odds for away matches.  Both plots use the teams' average value for shots, shots-on-goal, corners, and fouls (broken out as home and away averages) as inputs to the model.  Click on either of the graphs to enlarge them.



Several conclusions can be drawn from the graphs:
  • Manchester United clearly is the least sensitive to card differentials at home.  When playing to their average form, they win at least 75% of their matches.
  • Chelsea is the most sensitive of the home teams in the Big Six.  They drop from nearly 90% odds of winning to 30% over the range of fantasy point differentials they see.
  • Chelsea maintains the best form of the six teams when playing away from home.  This is due to the fact that the coefficient for the venue term of Chelsea's BLR was not statistically significant.
  • Liverpool is the most sensitive of the Big Six when they are away from home.  They go from a 90% odds of winning with their most favorable fantasy point differential to a 10% chance of winning a match when at the opposite end of the fantasy point differential.
  • Arsenal runs middle of the pack both home and away, and have a relatively linear relationship compared to the other teams.  This is due to Arsenal having the lowest magnitude coefficient in their fantasy point BLR term.
  • Tottenham Hotspur and Manchester City maintain the worst overall form of the Big Six across the range of fantasy point differentials.
More to come on match statistics and their impact on Premier League teams' odds of winning a match  in subsequent posts.

Friday, April 8, 2011

Friday Night Links

Another week has flown by, and not without a few mistakes along the way.  Herewith is your weekly dose of my favorite links.


New Books and Blogs

This week I started working through Power Pivot for the Data Analyst.  I recently bought Excel 2010, and it has a free download called PowerPivot.  I picked up the book at the suggestion of college friend and Excel program manager Diego Oppenheimer.  Diego's been able to build some pretty cool PowerPivot tables based upon World Cup historical data that he then used during last year's tournament. It effectively rendered ESPN's historical statistical commentary useless - why listen to them when you can have all the same data at your your fingertips at a moment's notice?

PowerPivot is a great tool for the lay statistician as it allows for huge tables to be constructed from freely available data on the internet, and then run multiple pivot tables pulling from each PowerPivot table.  PowerPivot also greatly improves the embedded calculations functionality within pivot tables.  Think of it as a much friendlier version of Access, or Excel's normal pivot table functions on crack.  I plan on using this functionality to complete a few analyses in the coming months on several large but unlinked MLS databases.  I'll keep you posted as to key insights from the book, but if you are a lay statistician and have a copy of Excel 2010 I highly suggest you download PowerPivot and get a copy of the book.

Favorite Links of the Week
That's all the links that will fit in this week's installment.  My Sounders are looking to FINALLY get their first win of the season after another impressive offensive performance last week.  Now if the defense could get a clean sheet, we'd be on a roll!  My Gunners cling to the faintest of hope, although I won't be watching that game live as it is on at 5:30 AM here on the Left Coast.  Enjoy whatever you are up to this weekend, and I'll see you on Monday!

Thursday, April 7, 2011

ABNG Turns 1 Today!

The first anniversary of my blog crept up on me today, so I will have to save the more reflective writing for some time next week when I can devote the proper time.  Since that first post a year ago I have made 133 posts (including this one), have 10 followers via Blogger, have 524 followers on Twitter, 30 on Facebook, and have made 3,087 tweets.  I've had too many good conversations to count via this blog, Twitter, email, and even a few face-to-face.  I'd like to take a moment to thank each and every reader and collaborator for their time, their feedback, and their understanding over the last year.  You guys make this blog worthwhile, and without you I would just be a stats-driven mad man barking at the soccer moon.

I do this recreationally, so there's no real payoff for me other than the online interaction with other great authors and readers.  Thank you again from the bottom of my heart, and here's hoping for an even better second year!

A Personal Example of the The Garbage-In/Garbage-Out Principle

Clearly, my problem was with a "garbage model".

The other night I was checking a few figures for an upcoming post that were based upon my binary logistic models (BLR) built from EPL match data, and I saw a number of counter intuitive trends.  That got me to check my data for a fourth time, and sure enough... I had fat fingered a formula!  It turns out that formula was used throughout the data set, and thus all of my analysis using DogFace's data had been using incorrect numbers.  In terms of my blog's header quote, my model was really wrong and wasn't really useful.  My post-match analysis of Arsenal/Blackburn and my quantification of Phil Dowd's officiating of Arsenal's matches were now both in question.  I had to re-run the analysis, which totaled several hours of statistical work and about twice as long checking my numbers.

It's a bit frustrating, as my blog relies on the accuracy of my numbers to drive its content.  I am very systematic in making sure they are right, and this is the first time I have found such an error.  However, more important than being right the first time is correcting mistakes when I find them.  This post is such a correction.

The Impact of Cards and Phil Dowd on Arsenal


After re-crunching the numbers, I found that the significant factors in the binary logistic regressions (BLR) turned out to be a bit better.  My erroneous calculations had resulted in my elimination of fantasy league points for cards from consideration in the BLR.  This was unfortunate because I had already done several posts on officiating at Arsenal's matches using that metric, and had hoped to re-use it here.  It turns out that when the numbers are correctly calculated, such a term is statistically significant.  Note that I have used Yahoo's Fantasy Premier League scoring system, which gives 3 points for a yellow card and 6 points for a red card.  All other terms from the BLR in the previous post - venues and differentials of shots, shots-on-goal, corners, and fouls - were included in the analysis.  The graphs below show the true relationship between those terms and fantasy point differential.



The updated models show that Arsenal is less impacted by cards at home than the average team in the Premier League, while they are impacted nearly identically as the average team when they are away from home.  Quite a different conclusion than my previous post that had erroneous data!  It turns out a number of the general conclusions regarding their overall odds, especially regarding their best away odds not even matching their odds of a home match where they experienced a fantasy point deficit of 5 points, didn't change much from the original post with bad data.

What about the affect of referees on Arsenal's matches?  The plots below show the impact each referee has on the odd's Arsenal wins a match compared to the odds of Arsenal winning the match had the referee handed out the average number of fouls and cards Arsenal saw over the five year period.  Only data from the 2006/07 through 2009/10 seasons was used as those were the four years where each of the eight referees had officiated at least one match.



Again, we see Phil Dowd lead the pack in odds differential penalty, although it is smaller than originally estimated (3% now vs. 4% with erroneous data).  Webb and Halsey's numbers round out those who have the greatest impact on Arsenal's odds of winning.  These corrected numbers will become more valuable in my next post on this topic, where I will compare the bias of Dowd, Webb, and Halsey to their records officiating other clubs' matches.

Finally, there's the small matter of Phil Dowd's officiating at Arsenal's recent match against Blackburn.  We Gooners still can't blame the loss on Phil Dowd - his officiating certainly helped the Gunners odds.  But their play didn't help nearly as much as my original post indicated, and in fact played into Balckburn's statistics a good bit.  The table below summarizes the match statistics, and shows the likelihood of winning by each club.


That's not a typo - Blackburn did actually have a higher odds of winning the match based upon the way the statistical models work.  Let me explain:
  • Arsenal's coefficient for the constant term in the BLR is significant, and it is negative.  This means that before anything else is known about Arsenal's match, they start out with less than a 50% chance of winning.  This is not the case with Blackburn, whose constant term for their BLR is non-signficant and thus gives a coefficient of zero for no effect on their odds of winning.
  • Arsenal's odds certainly increase playing at home and having a greater number of shots on goal, but their BLR coefficient for shots is not statistically significant so their is no significant contribution to their odds of winning from their 15 shot advantage.
  • Blackburn, on the other hand, does have a statistically significant coefficient for the shots term of the BLR, but it is negative.  That means as the opposition's shot differential increases, Blackburn's odds of winning go up.  This can make some sense, if one thinks in terms of accuracy.  A shot doesn't really matter unless it is on target.  Blackburn's coefficient for SOG is statistically significant as well, and is positive, which makes sense.  This means the more SOG's the opposition gets, the more Blackburn's odds of winning go down.  However, they actually benefit when team's take wild shots with little likelihood of scoring a goal.  This is all reflected in Arsenal's need to take so many shots to get roughly the same percentage of shots on goal as Blackburn.  Had they had better accuracy, Blackburn's odds would have been much lower.
  • Arsenal's odds suffer, and Blackburn's benefit, from Arsenal's corner differential.  This is a trend seen throughout the data set, both in the total league data and individual team data.  This possibly counter-intuitive trend will be explored in a later post.
  • The foul differential is zero, but even if it weren't it would not matter.  Neither team's BLR coefficient for fouls is statistically significant.
  • Clearly Arsenal benefits from only having a single yellow card vs. two and a red card for Blackburn.  This raises Arsenal's odds and lowers Blackburn's due to their statistically significant coefficients for fantasy points.
In all, it gives both teams high odds of winning the match.  Perhaps a draw was the most appropriate outcome according to the statistics.

Conclusion

The beautiful thing about blogging is that mistakes can be corrected instantly, unlike books where one must wait months or years until th enext edition is published or magazines where a retraction can be made next month.  The instantaneous nature doesn't negate the challenges associated with errors, but it does make the communication more honest and more open in a quicker manner.  I hope that in being honest with my mistake that it reassures you that I am always striving for quality data.  I'm redoubling my efforts to check all my numbers before I post any new material.  Catching the error and correcting it has turned out for the better when it comes to the flexibility of the data set and the conclusions that can be drawn.  Most importantly I have shown that fantasy points for yellow and red cards are statistically significant, which will enable the use of a single metric to capture the effect of two related, but one especially rare, events in a match.  Stay tuned for the follow up post...

Monday, April 4, 2011

Comparative Sports: The Big Business of NCAA College Basketball

While the bulk of my blog is about soccer statistics, I've routinely maintained that they are only a method to better understand the beautiful game. In my recreational reading, I often focus on another aspect of soccer and sports in general - the culture in which they exist today and the culture that has affected their success or failure over the years. Understanding sports culture - both inherent to the game itself and the wider national or international culture in which it operates - helps us better understand the rules, conventions, strategies, and opinions that permeate the game. To divorce the manmade creation of sport from the culture that created it and maintains it is to miss a major element of the sport itself.

One of the more interesting contrasts in worldwide sports is the realm of college athletics - so dominant in America but non-existant elsewhere in the world.  Gaming the World does an excellent job of providing a historical perspective to the rise of college athletics in the United States, and none is bigger than college basketball.  College basketball's national championship game is tonight, which will be contested by Butler and the University of Connecticut.  It is the culmination of nearly three weeks of a basketball orgy where every game is win-or-go-home.   I plan on watching the game, as will nearly 50 million people in the United States with a median viewing household income of $75,000 or more.  That's a lot of potential customers, and as such there is a ton of money involved in the tournament.

For a great summary of the money involved view the video below, and then check out this fact sheet. The video provides an excellent commentary put together by the PBS program Frontline, and includes interviews with NCAA President Mark Emmert, the man responsible for Nike being so intertwined with college athletics (Sonny Vaccaro), and numerous other key figures including former players now seeking a cut of the financial action. I'd highly encourage readers to let the video player go through all the chapters to get a full picture of what's involved in college basketball and March Madness before reading the rest of my blog entry.


Watch the full episode. See more FRONTLINE.

Some of the highlights from the video include:

  • 90% of NCAA revenue (that's the revenue for the governing body, not the individual schools) comes from the college basketball tournament. That begs the question - if a basketball tournament can generate such revenue, why hasn't college football gone to such a tournament to determine its champion and generate similar revenue?
  • The basketball tournament's current television contract is for $10.8 billion (yes, billion) over 10 years.
  • 16 teams in this year's tournament have a graduation rate of less than 50%. Very few reach a graduation rate of 100%. This reminds me of a phrase a former collegiate swimmer friend of mine uses - there are "student athletes" and then there are "athlete students". It appears the major sports have many more of the latter.
  • NCAA President Mark Emmert refers to the athletes as "pre-professionals", whereas critics recognize that college athletics are professional if every regard but one - the athletes aren't paid a dime.
  • The fact that even after college athletes have forfeited their amateur status - either in becoming a professional athlete for being paid for some other professional job - they are still denied any compensation for their likeness due to the contract they're required to sign. DVD's full of highlights and video games with an athlete's signature moves make millions for the NCAA and manufacturers of such content, but former stars never see a cent.

To be honest, I find the idea of amateur athletics in college to be a bit of a sham.  The reality is that lots of people are getting rich on the backs of these athletes, while the athletes get a shoddy education and often never earn a degree. During those years in college they've lost several years of real earnings, and are one significant injury away from never making a dime from their talents. Beyond shortchanging the athletes, the conflicts of interest, continual ethics violations by coaches and athletic department staff, and pitifully low standards of "student" athletes compromises the core mission of the university which is to educate students.

On the opposite side, college athletics has evolved for more than 100 years in the United States and their big business nature is due to that gradual evolution.  College athletics prominence grew out of a time when professional athletics wasn't a realistic option for most athletes.  The penultimate level of athletics through the 1950's, outside of baseball, was the college game.  By the 50's and 60's the emergence of the NFL as the top football league and the ascendancy of the NBA heralded an era where three major professional leagues competed for athletes and consumer dollars.  At the same time, the two newer leagues did not have a minor league system like baseball and continued to rely on colleges to develop players for them.  Given that colleges provide the only permanent sports attachment for rural areas, and increasingly for urban areas when leagues relocate their professional franchises, it should be no surprise that college teams remain as popular as they do today.

For two-and-a-half hours tonight I will forget about the wider cultural context of the event, and root for the underdog Butler Bulldogs (apologies to any Connecticut fans!).  But to understand the pressure of the event, as well as the pre- and post-game pageantry full of corporate sponsorships, requires an understanding of the finances and history that surround it.