Wednesday, September 3, 2008

Predicting total number of wins using Points

If you're going to devise a system to predict games, where do you start? Some people like to look at total yardage as their primary predictor, various trends and indicators, the type of turf and weather, or "Who would win in a fight: a lion or a bear?". I like to break down a sport to figure out what the absolute deciding factor is in a victory. Total yards? Sometimes a team with greater total yards wins, but sometimes not. Number of touchdowns? Three field goals beats one touchdown. The only factor that ALWAYS holds true in a victory is... number of points. The team with the greater number of points ALWAYS wins. So that must mean that a high scoring team usually beats a low scoring team. Here's some interesting data showing the relationship between points scored and number of wins.

First, let me introduce the concept of 'R-squared' or 'R^2'. R^2 is called the coefficient of determination, it's also the square of the coefficient of correlation. The R^2 value measures how closely a trend line fits the data. It has a value between 0 and 1; 1 meaning the trend line and the data fall exactly on the same line, 0 meaning the data and the trend line have absolutely nothing to do with each other. Therefore, R^2 near 1 says that you've got data which can be approximated by the trend line.

Now I'll show you the remarkable correlation between number of wins and points scored. I looked at both point ratio (points for / points against) and average point differential ((points for - points against)/# of games) for all teams from 2002-2007, that's 6 seasons. For every team, at the end of the season, I calculated their number of wins, their point ratio, and their point differential. I averaged together the point ratio and point differential for all like-numbered wins. For example, I grouped and averaged together all pt. ratios for teams that won 11 games, which gave me the average point ratio for teams that won 11 games.

Results

I chose to exclude teams with 1, 15, and 16 wins because in 6 seasons only 1 team fit into each of those categories. Also, I apologize if the graph is a little hard to read, different monitor resolutions might distort it.

Point ratio fits along the line y = 0.0903x + 0.3304 (y = PF/PA, x = # of wins) with an R^2 value of 0.9842.
Average point differential fits y = 1.8311x - 14.664 (y = Avg. Pt. Differential, x = # of wins) with R^2 value of 0.9969.

If we rearrange these equations, we can use them to give a rough prediction of the number of games a team will win come the end of the season. So, the number of games a team will win is based on the formulae:
Ratio:
Wins = (PF/PA)*11.0742 + 3.6589
Differential:
Wins = [(PF-PA)/Games]*0.5461 + 8.008

If you're going to make a bet with your buddy halfway through the season as to whether the Jets will win 3 or 4 games this season, check one of the equations first. (I like the differential equation more, better R^2 value.)

If you want to predict the outcome of games, you can use this to figure out who is supposed to win based on the number of wins they have left in the season. That may be a little abstract, though. Probably a better way to use this information is to work it into your own method. Even better, take away the idea that incorporating points into a system may be the difference between picking games at 47% and 53%.

Jonathan
Black Box Sports

Questions? E-mail me!

No comments:

Post a Comment