Wednesday, August 27, 2008

Simulating Teams vs. Simulating Players

Due to popular demand, I'll be writing a weekly article about various aspects of model building, money management strategy, or whatever questions anyone has. So if you're curious about something specific, speak up. This first article deals with the difference between simulating games using a team model versus an individual player model.


"Keep it simple, stupid" - Confucius

One of the most important decisions to make when simulating a sport is whether to simulate a game using team stats or to break it down further and use individual player stats. A convincing argument can be made for either method. If you've read anything about NFLSim's background, you know that the simulation uses team stats and not player stats. Here's a comparison of the two methods, in a football context:

Keep in mind that this is just one way to build a model; if you're building your own, use whatever method fits you.

1) From the viewpoint of a novice programmer, using team stats is really easy. There are a dozen different websites with consolidated, uniform, and sortable information. www.nfl.com and www.espn.com for example. Once you've acquired the data, you can easily manipulate it into the form that works for your program. The team stats can be incorporated into the simulation from a single web page. Grabbing an individual's stats takes a little more effort and problem solving. The difficulty lies in the automation of the process. Getting the program to find each team's website then find the player specific data you're looking for, can be tricky.

It doesn't sound much more difficult, but if you decide to use player statistics, you'll have to really work on your organizational skills. Remember, you'll have to retrieve and organize data from every position (with backups and second strings, etc.) from every team, i.e. DAL: QB Tony Romo, Brad Johnson, Richard Bartel; RB Marion Barber, Felix Jones, Tony Romo; WR .... .... .... You get the idea. All of this extra information that you use means you need a lot more computing power and a lot more patience.

2) Injuries? Substitutions? Trades? Here is where it may seem that simulation at the player level has an advantage over simulation at the team level. Surely when you account for individual changes, you'll get better accuracy, right? Well...maybe. Let's talk about team stats first. Team stats, at the very basic level do not take into account injuries, substitutions, trades or anything of the sort. Team simulations operate under the assumption that the team is a single, static object, which generates stats as the weeks go by, regardless of the players that make it up. From a programming standpoint, this makes things really easy because you don't have to worry about writing code to distinguish between different players and their respective stats, you just use a single set of statistics for the entire simulation.

From the player perspective: by accounting for major changes, you might be able to improve your accuracy. How do you reconcile in-game changes? The Cowboys consistently used Julius Jones and Marion Barber in the same game, so you have to figure out who runs each play in the simulation. The best method I can think of is finding how many attempts per game each RB has and proportion the plays in the simulation accordingly. When you consider every player for every team, this becomes pretty daunting.

Now let's assume there's an injury. If the Patriots have built up a set of passing statistics with Brady as the QB, those statistics are going to be pretty damn good, and they'll carry through to the next games. After 14 weeks, Brady gets injured and is out for the season. This is where a player simulation has its advantage; by using the great team stats that the Patriots had generated to simulate the subsequent games, you misrepresent the Patriots' skill as being greater than it actually is. Therefore, the next games will be inaccurate. When you use replace Brady with his backup, everything might work out. The tricky part is assigning averages or attributes to a player with no experience. You can figure out for yourself. Other provisions can be made when using team's stats if an injury occurs, like a assigning a general injury multiplier to the affected statistics. Trades can be treated in the same manner as injuries; both a player swap.

When deciding whether to write a simulation using team statistics or player statistics, the important factors to consider are: programming ability, patience, and free time. If you're an expert programmer with experience integrating web data with your respective programming language or if you've got a real drive to get the program done, consider using player stats. Otherwise, use team stats.

If you're wondering about how team accuracy compares with player accuracy, compare Black Box Sports and Accuscore. Black Box Sports' NFLSim uses team statistics for play-by-play simulations, Accuscore assigns attributes to individual players for their play-by-play simulation. This is the first full season for Black Box Sports, so we'll see who wins.

No comments:

Post a Comment