Sunday, April 1, 2012

Grouping subpopulations by FGA% isn't enough

I was rewriting my code just now so that I could look at multiyear posteriors, and while testing it, the following graph was spit out when I tried to reproduce my freshmen graphs from my last post:



At first, I thought that there must be a bug in my code, but I couldn't find one. A few more minutes of looking, and I realized that I wasn't using the same data as in my last post. Then, I only included players who had taken a three pointer in both the first and second half of the year. This group of data, on the other hand, included any player who had taken a three pointer in the second half, even if they had not taken a single shot in the first half. This ended up making a massive difference. The 50th-60th percentile, which drastically underperformed, did not take a single shot combined in the first half of the year. With no data to build a posterior, they were all assumed to have an average 3PA%, and in fact, they exceeded the average 3PA% (an obvious case of selection bias since, in order to make this data set, they had to have taken at least one three pointer in the second half).  Yet, they performed horribly. Obviously FGA% isn't enough to form proper priors: the amount a player plays/shoots also needs to be factored in. When the data set was limited to just those who took threes in both halves, that was masked, but this makes it very clear.

No comments:

Post a Comment