Wednesday, July 11, 2012

College shot chart data is useless

So, I found some shot charts over at CBS. Downloaded it all, loaded it into SQL and R, and went about trying to create a nice heat chart of ~1 million shots taken over the past decade, out of around 5 million in total. That should be a nice representative sample, right, even if it is biased towards the power conferences? Except, upon further examination, it's almost entirely worthless. Something like 28% of all the shots taken were marked as being 0 feet in length, damn you lazy sons of bitches marking everything close to the rim as being at the rim. For lulz, check out these two shot charts:

Kent State@Akron

Marist@Kentucky

There is one shot charted as being taken in the paint but not at the rim in the Kentucky/Marist game, as opposed to one shot total being marked as being zero feet in length in the Kent State/Akron game. While I applaud the Kent State/Akron game charter for his dedication to precision, I would have liked for him to bat better than .500 in actually marking down the location of a shot at all. While these games are extreme examples, the amount of garbage in these stats are make them pretty much completely worthless. Since I did go to the trouble of compiling this, I did check out all shots charted as taking place behind the three point line, hopefully it would be difficult to screw these up as badly as shots close to the rim. Who knows what kind of biases are present here, but behold:

Despite the constant length of the three point line in the college game, the corner three appears to be more valuable just as in the NBA. It's probably just because virtually every corner three is going to be a catch and shoot situation while straight ahead threes will more often be difficult pull up shots off the dribble, but you'll have to ask Synergy for those stats. Anyways, make of it what you will.