Showing posts with label sports modeling. Show all posts
Showing posts with label sports modeling. Show all posts

Wednesday, October 29, 2008

NBA Black Box

With the start of the NBA season on Tuesday, my life just got a little busier, though I can't wait to see if I can repeat the results of last season. For those who haven't checked it out yet, NBA Black Box is the little sister of NFL Black Box (NFL Black Box is the big brother). I began to write a possession-by-possession NBA simulation, but while I worked, I got sidetracked by a few equations I had come up with. NBA Black Box is not a simulation, it is a set of algorithms that work together to pick the straight up and ATS winners. I've given up on the elusive O/U; for some reason, it just doesn't work. There are no plans to finish the simulation - the algorithms are doing just fine.

Here are the results from when I started tracking, on March 1. They exceed my wildest expectation.

SINCE MARCH 1

WINS


PLAYOFFS
WINS


Correct Games Win%Correct
Games
Win %
279
353
79.0%
36
48
75.0%
SPREAD

SPREAD


Correct Games Win%Correct
Games
Win%
242
411
58.9%
32
54
59.3%

Those numbers are correct, check out NBA Black Box to verify. From March 1 to the end of the season, NBA Black Box was 279-74 (79.0%) picking the winner and 242-169 (58.9%) ATS. Similar results for the playoffs. From March 1, using a unit of 100, NBA Black Box returned 5610 units, for a ROI of 12.4%. And that's only after 2 months.

I'm going to track the progress on my own initially (not going to post the results), until I see some consistent performance. It takes time for reliable stats to build up. I have no idea how long that will be. It could be a couple of weeks or it could be a couple of months. The bottom line is I don't want to put unreliable picks up - I don't want people to lose money as a result of bad picks and I don't want to ruin my reputation for making quality predictions.

I'll keep you folks updated as the season continues.

Jonathan

Friday, October 17, 2008

Week 7: Requested Stats

A while back I got a request to post yds/pass, yds/play, and yds/pt. At this point in the season, I think the numbers are getting accurate, so here are those stats for week 7.

TEAM YDS/PASS YDS/PLAY YDS/PT
Arizona Cardinals


Atlanta Falcons


Baltimore Ravens 6.80 4.49 17.28
Buffalo Bills 7.59 5.32 12.65
Carolina Panthers 6.73 4.93 13.74
Chicago Bears 7.00 4.76 14.23
Cincinnati Bengals 6.39 4.18 13.41
Cleveland Browns 5.86 4.39 15.53
Dallas Cowboys 7.05 5.06 15.32
Denver Broncos 7.41 5.49 12.96
Detroit Lions 6.95 5.04 17.51
Green Bay Packers 8.11 5.71 13.52
Houston Texans 8.09 6.11 11.36
Indianapolis Colts 6.34 4.70 13.02
Jacksonville Jaguars


Kansas City Chiefs 5.38 4.38 14.60
Miami Dolphins 6.91 4.91 12.52
Minnesota Vikings 6.20 4.38 13.67
New England Patriots 7.88 5.68 14.57
New Orleans Saints 7.37 4.94 15.57
New York Giants 7.54 5.91 13.10
New York Jets 7.72 5.22 15.12
Oakland Raiders 6.71 4.94 11.81
Philadelphia Eagles


Pittsburgh Steelers 6.05 4.14 12.52
San Diego Chargers 7.31 4.94 14.76
San Francisco 49ers 7.09 4.93 14.95
Seattle Seahawks 5.83 4.46 15.55
St. Louis Rams 6.90 5.00 12.57
Tampa Bay Buccaneers 7.25 5.55 12.96
Tennessee Titans 6.62 4.90 13.98
Washington Redskins 7.49 5.63 12.20




Average 6.949 5.002 13.962

Friday, October 10, 2008

Week 6: Top 5 / Bottom 5

(The game predictions are below this post)
Let's have a little fun this week.

The top 5 QB's by rating will be:

1. Aaron Rodgers, GB: 109.3
2. Jason Campbell, WAS: 109.2
3. Chad Pennington, MIA: 105.7
4. Jay Cutler, DEN: 104.0
5. Gus Frerotte, MIN: 103.4
Botom 5:
28. Jeff Garcia (in for Griese), TB: 83.0
29. Dan Orlovsky (in for Kitna), DET: 82.0
30. Derek Anderson, CLE: 75.4
31. Peyton Manning, IND: 73.1
32. Charlie Frye (in for Hasselbeck), SEA: 71.5

Top 5 teams by rushing yards:
1. Washington Redskins: 178.3
2. Atlanta Falcons: 171.8
3. Baltimore Ravens: 165.3
4. Oakland Raiders: 160.8
5. Seattle Seahawks: 160.6
Bottom 5:
28. Philadelphia Eagles: 75.3
29. New Orleans Saints: 72.7
30. Cincinnati Bengals: 63.4
31. Indianapolis Colts: 47.4
32. Detroit Lions: 46.5

Top 5 offenses by Points Scored:
1. Minnesota Vikings: 30.8
2. Denver Broncos: 30.2
3. San Diego Chargers: 29.0
4. Washington Redskins: 27.6
5. New York Giants: 27.5
Bottom 5:
28. Carolina Panthers: 19.8
29. Detroit Lions: 18.8
30. Indianapolis Colts: 18.4
31. St. Louis Rams: 18.4
32. Cleveland Browns: 17.8

Top 5 Defenses by Yards Allowed:
1. New York Giants: 229.7
2. Baltimore Ravens: 234.4
3. New York Jets: 260.6
4. Minnesota Vikings: 278.4
5. Washington Redskins: 280.2
Bottom 5:
28. Seattle Seahawks: 350.5
29. Detroit Lions: 355.6
30. Denver Broncos: 357.5
31. Houston Texans: 367.0
32. St. Louis Rams: 395.0

That's enough for now. Any opinions?

Thursday, September 18, 2008

Week 2 Analysis

I'm not gonna lie, week 2 didn't go too well. Though that's not so surprising considering the nature of simulations - accuracy increases as the amount of data increases. Win = Green, Loss = Red, Push = Blue.

NYG -8.5 at STL
Winner: NYG, 76%
Spread: EVEN

IND -2 at MIN
Winner: MIN, 54%
Spread: MIN +2, 58%

NO -1 at WAS
Winner: NO, 61%
Spread: NO -1, 60%

CHI at CAR -3
Winner: CAR, 51%
Spread: CHI +3, 61%

BUF at JAX -5
Winner: BUF, 64%
Spread: BUF +5, 77%

TEN at CIN -1
Winner: TEN, 79%
Spread: TEN +1, 80%

GB -3.5 at DET
Winner: DET, 59%
Spread: DET +3.5, 64%

OAK at KC -3.5
Winner: KC, 81%
Spread: KC -3.5, 72%

SF at SEA -6.5
Winner: SF, 79%
Spread: SF +6.5, 89%

ATL at TB -7
Winner: ATL, 80%
Spread: ATL +7, 92%

SD -1 at DEN
Winner: DEN, 74%
Spread: DEN +1, 75%

BAL at HOU -4.5
Winner: BAL, 70%
Spread: BAL +4.5, 80%

NE at NYJ -1.5
Winner: NYJ, 53%
Spread: EVEN

MIA at ARI -6.5
Winner: ARI, 57%
Spread: MIA +6.5, 61%

PIT -6 at CLE
Winner: PIT, 91%
Spread: PIT -6, 79%

PHI at DAL -7
Winner: PHI, 65%
Spread: PHI +7, 84%

Straight % Wins Games Win %

50-59 2 5 40.0%

60-69 1 3 33.3%

70-79 4 4 100.0%

80-89 0 2 0.0%

90-100 1 1 100.0%

Total 8 15 53.3%





Spread % Wins Games Win %

50-59 0 1 0.0%

60-69 0 3 0.0%

70-79 2 4 50.0%

80-89 2 3 66.7%

90-100 0 1 0.0%

Total 4 12 33.3%

Saturday, September 13, 2008

Week 2 Simulated Stats

I got a request to post yards/pass, yards/play, and yards/point, so here they are. Again, remember that it's early in the season, so there isn't very much information for NFLSim to use. At this point in the season it may be better to use the stats for comparisons rather than to use them as definite predictions. I apologize for the (lack of) formatting of the data tables, if anyone knows how to insert tables from Excel, I'd appreciate some help. I already tried copy and paste. The tables won't be pretty this week, hopefully I'll figure it out for next week.

TEAM YDS/PASS YDS/PLAY YDS/POINT
ARI 8.57 6.00 12.46
ATL 11.25 7.85 12.05
BAL 6.35 5.68 13.01
BUF 7.58 5.74 13.34
CAR 6.17 5.30 11.48
CHI 7.81 5.86 15.83
CIN 5.70 4.57 14.54
CLE 6.59 5.80 16.13
DAL 8.46 6.24 14.48
DEN 8.74 6.95 11.95
DET 6.91 5.66 9.64
GB 10.36 8.26 16.35
HOU 5.95 5.10 13.10
IND 7.18 5.65 13.70
JAX 5.64 4.27 8.48
KC 9.19 6.66 10.71
MIA 7.04 5.26 12.40
MIN 6.40 6.11 13.20
NE 6.75 5.08 12.01
NO 7.86 5.99 12.84
NYG 8.00 6.01 12.37
NYJ 8.38 6.22 12.86
OAK 7.50 6.24 19.02
PHI 7.53 6.02 12.08
PIT 8.30 5.98 10.17
SD 7.54 5.90 14.65
SF 8.34 5.87 12.10
SEA 6.06 4.90 13.90
STL 6.21 5.29 13.59
TB 7.59 6.47 14.01
TEN 6.35 5.48 10.91
WAS 5.61 6.04 14.27

Wednesday, August 27, 2008

Simulating Teams vs. Simulating Players

Due to popular demand, I'll be writing a weekly article about various aspects of model building, money management strategy, or whatever questions anyone has. So if you're curious about something specific, speak up. This first article deals with the difference between simulating games using a team model versus an individual player model.


"Keep it simple, stupid" - Confucius

One of the most important decisions to make when simulating a sport is whether to simulate a game using team stats or to break it down further and use individual player stats. A convincing argument can be made for either method. If you've read anything about NFLSim's background, you know that the simulation uses team stats and not player stats. Here's a comparison of the two methods, in a football context:

Keep in mind that this is just one way to build a model; if you're building your own, use whatever method fits you.

1) From the viewpoint of a novice programmer, using team stats is really easy. There are a dozen different websites with consolidated, uniform, and sortable information. www.nfl.com and www.espn.com for example. Once you've acquired the data, you can easily manipulate it into the form that works for your program. The team stats can be incorporated into the simulation from a single web page. Grabbing an individual's stats takes a little more effort and problem solving. The difficulty lies in the automation of the process. Getting the program to find each team's website then find the player specific data you're looking for, can be tricky.

It doesn't sound much more difficult, but if you decide to use player statistics, you'll have to really work on your organizational skills. Remember, you'll have to retrieve and organize data from every position (with backups and second strings, etc.) from every team, i.e. DAL: QB Tony Romo, Brad Johnson, Richard Bartel; RB Marion Barber, Felix Jones, Tony Romo; WR .... .... .... You get the idea. All of this extra information that you use means you need a lot more computing power and a lot more patience.

2) Injuries? Substitutions? Trades? Here is where it may seem that simulation at the player level has an advantage over simulation at the team level. Surely when you account for individual changes, you'll get better accuracy, right? Well...maybe. Let's talk about team stats first. Team stats, at the very basic level do not take into account injuries, substitutions, trades or anything of the sort. Team simulations operate under the assumption that the team is a single, static object, which generates stats as the weeks go by, regardless of the players that make it up. From a programming standpoint, this makes things really easy because you don't have to worry about writing code to distinguish between different players and their respective stats, you just use a single set of statistics for the entire simulation.

From the player perspective: by accounting for major changes, you might be able to improve your accuracy. How do you reconcile in-game changes? The Cowboys consistently used Julius Jones and Marion Barber in the same game, so you have to figure out who runs each play in the simulation. The best method I can think of is finding how many attempts per game each RB has and proportion the plays in the simulation accordingly. When you consider every player for every team, this becomes pretty daunting.

Now let's assume there's an injury. If the Patriots have built up a set of passing statistics with Brady as the QB, those statistics are going to be pretty damn good, and they'll carry through to the next games. After 14 weeks, Brady gets injured and is out for the season. This is where a player simulation has its advantage; by using the great team stats that the Patriots had generated to simulate the subsequent games, you misrepresent the Patriots' skill as being greater than it actually is. Therefore, the next games will be inaccurate. When you use replace Brady with his backup, everything might work out. The tricky part is assigning averages or attributes to a player with no experience. You can figure out for yourself. Other provisions can be made when using team's stats if an injury occurs, like a assigning a general injury multiplier to the affected statistics. Trades can be treated in the same manner as injuries; both a player swap.

When deciding whether to write a simulation using team statistics or player statistics, the important factors to consider are: programming ability, patience, and free time. If you're an expert programmer with experience integrating web data with your respective programming language or if you've got a real drive to get the program done, consider using player stats. Otherwise, use team stats.

If you're wondering about how team accuracy compares with player accuracy, compare Black Box Sports and Accuscore. Black Box Sports' NFLSim uses team statistics for play-by-play simulations, Accuscore assigns attributes to individual players for their play-by-play simulation. This is the first full season for Black Box Sports, so we'll see who wins.