Thursday, March 17, 2011
Loose Ends - Part I: Predicting Future Success
My plan is to put out a series of posts - hopefully all within the next while - that relate to subjects that I've posted on previously. The object of these posts is to address certain outstanding issues that weren't resolved when I tackled these subjects the first time around.
The first post in the series is an extension of a post that I published last month that looked at how various shot metrics - all of them calculated at even strength with the score tied - predicted future success at the team level.
One related issue that wasn't explored is how well those same shot metrics predict future success when compared to more conventional measures of team strength, such as winning percentage and goal ratio.* This question is actually more fundamental than the one investigated in the original post. After all, if shot metrics like Fenwick and Corsi failed to predict future success better than the conventional measures, then that would render them considerably less useful.
The method employed** was similar to the one used in the first post. Because of the relative complexity of the process, including a step-by-step description may be helpful.
Firstly, I randomly selected a certain number of games from each team's schedule, with each team having an equal number of home and road games selected.
Secondly, I calculated how each team performed over those games with respect to certain variables. The variables that were calculated were even strength Corsi with the score tied, overall goal ratio (with empty net and shootout goals excluded), and winning percentage. Winning percentage was defined as WINS/(WINS+LOSSES). Games that ended in a shootout were considered ties, and were therefore not included in the calculation.
I then randomly selected a second, independent group of games. That is, if a game was included in the first grouping, it was not eligible for selection in the second grouping. As with the first grouping, an equal number of home and road games were selected for each team.
I then determined how each team did in terms of winning percentage over this second group of games, and looked at how each of the three variables calculated in relation to the first group correlated with winning percentage in the second group.
The relationship between the size of the two groups can be expressed as y=(80-x), where x represents the number of games included in the first group, and y the number of games in the second group. So, for example, if 20 games were selected for the first group, the second group would consist of 60 games. Ultimately, I elected to use x values of 20, 30, 40, 50, 60 and 70.
The raw data used was from the 2007-08, 2008-09 and 2009-10 regular seasons. The table included below shows the results for each individual season, as well as the average results. The values represent the average correlation over 1000 calculations.
A couple points:
- Corsi Tied is the best predictor of how a team will perform over the remainder of its schedule, regardless of the point in the schedule at which the calculation occurs.
- Corsi Tied is only marginally more predictive of future success than goal ratio or winning percentage when looking at samples of 60 games or more. In other words, as the sample size becomes increasingly large, there are diminishing returns with respect to the predictive advantage of Corsi. By the end of the season, all three variables seem to predict future success equally well
- The above fact has implications in terms of determining playoff probabilities at the team level, with the results suggesting that a composite metric would work best
- The aggregate values for Goal Ratio and Winning Percentage are remarkably similar. The implication is that once shootout results are controlled for, winning percentage is as good of a measure of a team as goal ratio is
Next up: Score Effects and Minor Penalties.
*Some readers may have observed that the split-half reliability of goal ratio (0.417) was lower than the predictive validity co-efficients for both Corsi Tied (0.444) and Fenwick Tied (0.429). The implication is this is that the two latter variables are better able to predict goal ratio from one half of the schedule to the other than goal ratio is itself.
** I should note that this method was actually developed and first used by Vic Ferrari. See here.
Scott Reynolds had a question in the comments section on how the results would differ if we looked at future EV performance rather than overall performance. Using the same method as the one described above, I looked at which of EV Corsi Tied and EV goal ratio (empty netters removed) was better able to predict future performance at even strength (which I operationalized as future EV goal ratio). Here are the results:
The results aren't too different - Corsi Tied is a much better predictor early in the schedule, but the two measures have about the same predictive power by the end of the year.