Wednesday, August 27, 2008

Simulating Teams vs. Simulating Players

Due to popular demand, I'll be writing a weekly article about various aspects of model building, money management strategy, or whatever questions anyone has. So if you're curious about something specific, speak up. This first article deals with the difference between simulating games using a team model versus an individual player model.


"Keep it simple, stupid" - Confucius

One of the most important decisions to make when simulating a sport is whether to simulate a game using team stats or to break it down further and use individual player stats. A convincing argument can be made for either method. If you've read anything about NFLSim's background, you know that the simulation uses team stats and not player stats. Here's a comparison of the two methods, in a football context:

Keep in mind that this is just one way to build a model; if you're building your own, use whatever method fits you.

1) From the viewpoint of a novice programmer, using team stats is really easy. There are a dozen different websites with consolidated, uniform, and sortable information. www.nfl.com and www.espn.com for example. Once you've acquired the data, you can easily manipulate it into the form that works for your program. The team stats can be incorporated into the simulation from a single web page. Grabbing an individual's stats takes a little more effort and problem solving. The difficulty lies in the automation of the process. Getting the program to find each team's website then find the player specific data you're looking for, can be tricky.

It doesn't sound much more difficult, but if you decide to use player statistics, you'll have to really work on your organizational skills. Remember, you'll have to retrieve and organize data from every position (with backups and second strings, etc.) from every team, i.e. DAL: QB Tony Romo, Brad Johnson, Richard Bartel; RB Marion Barber, Felix Jones, Tony Romo; WR .... .... .... You get the idea. All of this extra information that you use means you need a lot more computing power and a lot more patience.

2) Injuries? Substitutions? Trades? Here is where it may seem that simulation at the player level has an advantage over simulation at the team level. Surely when you account for individual changes, you'll get better accuracy, right? Well...maybe. Let's talk about team stats first. Team stats, at the very basic level do not take into account injuries, substitutions, trades or anything of the sort. Team simulations operate under the assumption that the team is a single, static object, which generates stats as the weeks go by, regardless of the players that make it up. From a programming standpoint, this makes things really easy because you don't have to worry about writing code to distinguish between different players and their respective stats, you just use a single set of statistics for the entire simulation.

From the player perspective: by accounting for major changes, you might be able to improve your accuracy. How do you reconcile in-game changes? The Cowboys consistently used Julius Jones and Marion Barber in the same game, so you have to figure out who runs each play in the simulation. The best method I can think of is finding how many attempts per game each RB has and proportion the plays in the simulation accordingly. When you consider every player for every team, this becomes pretty daunting.

Now let's assume there's an injury. If the Patriots have built up a set of passing statistics with Brady as the QB, those statistics are going to be pretty damn good, and they'll carry through to the next games. After 14 weeks, Brady gets injured and is out for the season. This is where a player simulation has its advantage; by using the great team stats that the Patriots had generated to simulate the subsequent games, you misrepresent the Patriots' skill as being greater than it actually is. Therefore, the next games will be inaccurate. When you use replace Brady with his backup, everything might work out. The tricky part is assigning averages or attributes to a player with no experience. You can figure out for yourself. Other provisions can be made when using team's stats if an injury occurs, like a assigning a general injury multiplier to the affected statistics. Trades can be treated in the same manner as injuries; both a player swap.

When deciding whether to write a simulation using team statistics or player statistics, the important factors to consider are: programming ability, patience, and free time. If you're an expert programmer with experience integrating web data with your respective programming language or if you've got a real drive to get the program done, consider using player stats. Otherwise, use team stats.

If you're wondering about how team accuracy compares with player accuracy, compare Black Box Sports and Accuscore. Black Box Sports' NFLSim uses team statistics for play-by-play simulations, Accuscore assigns attributes to individual players for their play-by-play simulation. This is the first full season for Black Box Sports, so we'll see who wins.

Friday, August 22, 2008

Using past stats for the pre-season

Yesterday, I got a great question from 'singletrack'; I probably should have answered it in a new post, instead I answered it as a comment in the previous post. He asked whether or not the previous season's stats can be used to predict games during the following season, i.e. Can the 2007 regular season be used to help improve your prediction accuracy in the 2008 pre-season?

My quick answer is 'no', because the teams play with modified line-ups and the coaches aren't necessarily playing to win.

I'm sure most of the readers here are big into stats, trends, predictions, etc. If anyone has taken a more detailed look at the problem, please jump in with your findings. Or, if you just have an opinion, jump in with that.

Wednesday, August 13, 2008

Status of this season's picks

Hey Blackboxers (visitors to this site will now be referred to as Blackboxers),

The website www.capperspicks.com has picked me up as a provisional handicapper. They've got a ton of great resources for anyone that's interested in sports betting or just sports in general. Lots of useful features like sportsbook reviews, a parlay calculator, articles on money management, different types of bets, glossary, etc. You can also find stats and trends and whatever you want.

Enough about them, let's talk about me. I'll be posting my picks for free in their forums (not sure which one yet, when I find out, I'll post the link). Also, I'm going to continue to post everything on this website... for FREE!... which will include:

- Predictions of the straight-up winners with confidence number (great for survivor pools)

- Picks ATS and over/under picks with their respective confidences


- Power rankings and ratings for offense, defense, and overall


- I'm going to post the optimal betting strategies which will include suggested bets and parlays, bets to stay away from, etc.


- Also, I got a request from a Blackboxer to post some stats NFLSim generates. These will include: Yards per Pass, Yards per Play, and Yards per Point. Bearing in mind that NFLSim generates more stats than you could ever want (any game stat you can find on the NFL website), I'm willing to oblige requests for the posting of certain stats.


That's all I can think of for now, but I'll be back soon with updates when I think of something else.

If anyone is betting in the preseason, you're crazy, but good luck to ya.

Tuesday, August 5, 2008

Welcome Back

Hey guys,

Football season's right around the corner, finally. It's almost time to have some fun with these predictions. Officially, the first week I'll simulate is week 2 (Sept. 14). I need data for the simulation and unfortunately preseason numbers don't work. Although, if anyone's curious, I suppose I can run a few sim's for sh!ts 'n giggles.

Write a comment with your fantasy matchup; specify the home team (or neutral) and let me know if you want me to use numbers from the end of the '07 season or preseason.

More updates will follow as the 2008 NFL season approaches.

p.s. I got a shiny new laptop that runs the program 4x as fast as my old crappy laptop (craptop). This means that I can run even more simulations for each game, making the predictions even more accurate!