Tuesday, March 9, 2010

Shot Recording Bias: Part n

This post is the third post that I've made on the subject. For a more detailed discussion of the methodology and reasoning applied, please refer to my first two posts ([1] [2]).

After looking over my previous post on the subject -- the one examining Florida and New Jersey specifically, I realized that I'd made an error in inputting the data for the 2007-08 and 2008-09 seasons. Here are the corrected charts. It may be necessary to enlarge them in order to properly view the information.

New Jersey


In my original post, I concluded that the shot recorder in New Jersey undercounts shots on goal. The corrected data does nothing but affirm that conclusion.

The only major difference is that the chart contained in my original post incorrectly showed that there were more shots counted in New Jersey home games than New Jersey road games in both 2007-08 and 2008-09. This led me to suspect that the bias may no longer persist, notwithstanding the fact that the shooting percentage in New Jersey home games was higher than the shooting percentage in New Jersey road games during the two seasons in question.

However, as is evident from the corrected chart, there were actually fewer shots in Devils home games for both 2007-08 and 2008-09. This is consistent with the data from previous seasons, the corresponding shooting percentage data for 2007-08 and 2008-09, as well as the undercounting hypothesis.

Florida



The corrected data for Florida, however, does serve to affect my conclusions somewhat. While the home-road shot gap for 2007-08 and 2008-09 is similar in magnitude to that observed in the previous three seasons, the shooting percentage data for those two seasons suggests that an overcounting bias may have emerged. However, I'm reluctant to assert the existence of a bias on the basis of two seasons worth of data alone, especially considering that the shot gap has not increased materially.

Other Arena Recording Biases

Given that we're on the subject, I figured I'd take this opportunity to explore the issue of shot recording bias more generally.






The above tables show each team's home-road splits for shots on goal and shooting percentage from 2003-04 to 2008-09. The first table shows the 15 teams that had the greatest number of recorded shots on goal in home games relative to road games, and ranks those teams in descending order. The second table basically shows the reverse.

The first highlighted column in each table displays the number of shots recorded in home games, minus the number of shots recorded in road games.

The second highlighted column displays road game shooting percentage minus home game shooting percentage.

Where there exists a significant positive value in both columns, an overrecording bias is implied.

Conversely, where both values are significantly negative, an underrecording bias is implied.

Looking at the two tables together, it would appear that the shot recorders in Colorado, Ottawa, Nashville and Boston overcount shots to some degree, whereas the recorders in Minnesota, Dallas, St. Louis and Vancouver are seemingly guilty of undercounting.

Of course, a more rigorous analysis is required before any conclusions can be reached.

Colorado


Ottawa


Nashville

Boston

Minnesota

Dallas

St.Louis

Vancouver


The above tables break down the home-road shot and shooting percentage splits by game state and season for the eight listed teams. I'm not sure if these tables add all that much on top of the aggregated data presented earlier, although I think that their inclusion is valuable for two reasons.

For one, the home games of some teams might have featured more special teams play over the period in question, even by sheer chance alone. As both shot rate and shooting percentage increase significantly on special teams relative to even strength, this factor can potentially distort the overall data.

Secondly, it's important to break down the data by season in order to see if any of the apparent recording biases are time-limited -- that is, present in some seasons but not others. For example, it's conceivable that some teams have employed more than one arena statistician at different points over the last seven years.

As for the tables themselves, one thing that strikes me as unusual is that the home-road shooting percentage gap for the Wild is quite large on special teams yet virtually non-existent at even strength (indeed, not even in the predicted direction). I can't think of any reason why this would be so, although it leads me suspect that there may be no bias. The home-road shot gap is large, but that could be a product of the Wild playing more conservatively at home.

Looking at the data collectively, there's overwhelming evidence of a recording bias in Dallas and Colorado, strong evidence of one in Vancouver and Ottawa, and moderate evidence of bias in the other four locations.


The above table requires some description. It essentially shows the 95% and 99% confidence intervals for each team's home shooting percentage (that is, the shooting percentage by both teams) during the period in question (2003-04 to 2008-09). The intervals were generated by assuming that each team had the same underlying shooting percentage on the road as at home, and that shots were recorded accurately irrespective of game location.

The final column shows what the shooting percentage in each team's home games actually was over that timeframe. Values colored light blue fall outside the 95% confidence interval. Highlighted values fall outside both confidence intervals. White colored values are within both confidence intervals.

A specific example will be illustrative. The Stars shot 0.081 at EV from 2003-04 to 2008-09. Using that value as their underlying home shooting percentage, their home shooting percentage would be expected to fall within 0.075 and 0.087 95% of the time, and between 0.073 and 0.089 99% of the time. The observed value was 0.089, which strongly implies that shots were undercounted in Dallas during this period.


Of course, assuming that each team's actual road shooting percentage is roughly equivalent to its underlying road shooting percentage is somewhat questionable. For example, if the underlying shooting percentage in a team's road games is 0.092, the 95% confidence interval after 12000 shots -- the average number of shots in road games for teams during the 5 year period -- is roughly between 0.087 and 0.098.

That being the case, the above table represents a slightly different approach. The left-hand column titled 'DIFF' shows the absolute difference in home and road shooting percentage for each team over the entire sample, for both EV and overall. The right-hand column titled 'PROB' displays the probability of a difference that large or larger occurring by change alone (over 100 simulations).

So, by way of example, the difference between the EV shooting percentage in Dallas road games and Dallas home games was 0.008. The probability of a difference at least that large arising from chance alone is 5%. In other words, it probably isn't the result of chance, but, rather, because shots have been undercounted in Dallas over that period.

Conclusions

So, what can we conclude from all that?
  • The shot recorder in New Jersey undercounts (this was addressed in a previous post)
  • The shot recorder in Dallas undercounts
  • The shot recorder in Colorado overcounts
  • The shot recorder in Vancouver almost certainly undercounts
  • The shot recorder in Ottawa probably overcounts
  • The shot recorders in Boston and Nashville may overcount, but the evidence is not conclusive
  • The shot recorders in St. Louis and Minnesota may undercount, but the evidence is not conclusive

7 comments:

Anonymous said...

awesome post

Scott Reynolds said...

Good stuff JLikens. Perhaps this could make an interesting summer project (i.e. watching some games played in these arenas and manually counting shots to see what results we might come up with).

Sunny Mehta said...

well done, J.

JLikens said...

Scott:

That's a good idea.

Sunny has already done this for a couple Devils home games, and his results were consistent with shots being under-recorded there.

Now to do the same for the other locations.

Sunny:

Thanks.

The corrected data for NJ makes sense in light of your study earlier this season.

My earlier post back in December found that there were slightly more shots recorded in NJ home games relative to road games, which was confusing given your results.

At the time, I reckoned that the screw up was on my end, although I wasn't able to confirm that until now.

Sunny Mehta said...

You know, the funny thing is, before any of our studies on this phenomenon (I'm talking like, a couple years ago), I was watching a Devils game where they were in Anaheim playing the Ducks. Chico Resch, the Devils' color guy (as well as former goaltender), made a comment during the broadcast like "I was talking to Marty before the game, and well, you know goalies love playing in this building because somehow their save percentage is magically a few points higher than usual."

For some reason I completely forgot about that comment until I saw your recent list with Anaheim on top. So I wonder if this is something that GM's, coaches, etc have been aware of for a while. If so, I wonder how they "correct" for it in their analyses.

Further, I wonder how we should correct for it when looking at stats for analysis purposes. The two basic solutions would be to use only road numbers (i.e. - I'd like to see goalie ES SV% numbers post-lockout for road numbers only), or to use all shot attempts (i.e. Corsi/Fenwick).

JLikens said...

Anaheim is an interesting case. The shot gap is large enough (as you said, the largest in the league over the period studied) to make me think that there’s a good chance that shots are being over reported there, even if the shooting percentage is also higher.

I agree that some effort to correct arena bias should be made when evaluating goaltender performance. Personally, I would prefer using road save percentage, if for no other reason than the simplicity involved.

A while back, I attempted to develop a model that would correct each goaltender’s save percentage based on whether their home scorer overcounted or undercounted shots on goal relative to total shots directed at the net.

The model was based on the premise that whereas the former (SOG) is somewhat subjective and, therefore, susceptible to bias, the latter (shots directed at the net) is inherently less discretionary and ought to be recorded more or less faithfully across arenas.

I quickly realized that my premise was flawed. The recording of blocked and missed shots is subject to at least as much recording bias as shots on goal.

For example, looking at Devils games in 0809, there were 962 EV blocks in NJ road games and 664 EV blocks in NJ home games. For EV missed shots, the corresponding values were 732 and 617.

In other words, not only does the NJ shot recorder undercount shots on goal, but blocked and missed shots as well, and with more severity at that.

The results for other locations reveals similar discrepancies (MTL, CAR, and TOR undercount blocks and misses; BOS, CHI, and ATL overcount them).

So, in other words, shot recording – whether in relation to shots on goal or shots directed at the net –appears to be a mess in general.

Host Pay Per Head said...

I don't think that it is enough it is quite a huge topic and there are lot of loosing things to cover.