I was trying to find a good way to place in Pick3 and Lucky Ladders, and after 10+ hours of work, I think I have a decent model. It uses linear regression of the 1,992 regular-season games played from 1997-2004, and uses ratings for both home and away offense and defense.

If I had these picks before Week 6 started, I would've had ten points in lucky ladders, with Cleveland at #5 being my downfall. If I had picked the Cleveland game right, I would have had the first 8 picks right, and possibly the first 9 if Indy wins tonight.

For a grand total of 305.3 points, which would have been 61st place.

It went 9-4 straight up so far this week. The incorrect picks were Pittsburgh, Cleveland, New England (had them winning by one), and Detroit (also had them winning by one).

A good way to compare the accuracy of these predictions is to test them against Vegas' predictions.

For 6 out of the 14 games, my predicted spread was closer than Vegas' to the outcome, but in games where my predicted spread was more than 2 points away from Vegas' predicted spread, 5 out of 7 of my spreads were closer to reality. 6 out of my 13 O/U predictions were closer to the actual O/U number than Vegas' predictions were.

Anyways, here are the week 7 predictions:

The main weaknesses are that it doesn't factor in injuries or strength of schedule. If you try to bet using these predictions, I guarantee you that you will lose money - betting wasn't the goal here.

EDIT: I updated the model by factoring in overall performance instead of just home vs. away. I did this because each team only has 2 home or away games so far, so teams who happened to get lucky and play crappy teams on the road had a biased away rating. Now, home/away ratings account for half of the total rating, and overall rating counts for the other half.

I stand firmly by my prediction of Houston's score.

Last edited by Demented Avenger on Wed Nov 30, 2005 4:23 am, edited 7 times in total.

danleroi22 wrote:One question... how is it possible for a team to score negative points in the NFL???

Lol...

Cool though... I like these.

I used a linear regression model with an intercept of about -12, so if a team's offense is bad enough and the opposing defense is good enough so that the team doesn't make up for these 12 extra points, the system will predict a negative score.

I could have made the base for all scores 0 instead of -12, but this way was a little more accurate for the most part in terms of correlation (about 26% this way, about 24% the other way).

danleroi22 wrote:One question... how is it possible for a team to score negative points in the NFL???

Lol...

Cool though... I like these.

I used a linear regression model with an intercept of about -12, so if a team's offense is bad enough and the opposing defense is good enough so that the team doesn't make up for these 12 extra points, the system will predict a negative score.

I could have made the base for all scores 0 instead of -12, but this way was a little more accurate for the most part in terms of correlation (about 26% this way, about 24% the other way).

Awesome stuff man... I've been meaning to do this forever and have never gotten around to it.

Is the data really linear? Do you find that you're consistently violating any of the standard regression assumptions (heteroskedasticity would be the biggest issue, I would think). Can you get better predictions with some non-linear transformations of your explanatory variables? What's the R-squared for the model?

I am the Walrus

The_Dude

General Manager

Posts: 3481

(Past Year: 2)

Joined: 14 Aug 2003

Home Cafe: Football

Location: My ivory tower, where I oversee the intellectual development of America's youth

danleroi22 wrote:One question... how is it possible for a team to score negative points in the NFL???

Lol...

Cool though... I like these.

I used a linear regression model with an intercept of about -12, so if a team's offense is bad enough and the opposing defense is good enough so that the team doesn't make up for these 12 extra points, the system will predict a negative score.

I could have made the base for all scores 0 instead of -12, but this way was a little more accurate for the most part in terms of correlation (about 26% this way, about 24% the other way).

Awesome stuff man... I've been meaning to do this forever and have never gotten around to it.

Is the data really linear? Do you find that you're consistently violating any of the standard regression assumptions (heteroskedasticity would be the biggest issue, I would think). Can you get better predictions with some non-linear transformations of your explanatory variables? What's the R-squared for the model?

I've never heard the word heteroskedasticity in my life. I looked it up, and it means different variances among random variables right? Does it refer to the variance among one random variable for different values of that random variable, or the comparison of the variances of two different random variables? If it means the former, it seems like the more a team scores, the more the final score will vary, so homoskedasticity appears to be violated.

What's the best way to test for heteroskedasticity? Are there any other regression assumptions I should test?

I'm just using Excel, so the only types of regression I can do are unweighted linear and exponential. The exponential didn't work very well becuase of the shutouts, and even when I changed all the 0s to 1s the r2 value was below 10%. Maybe I'll import the data into Matlab to generate my coefficients. I love using Excel though, because I can just type in the names of any two teams in my Home and Away slots, and type in a factor to weight home-field advantage versus overall stats (I'm using 2:1 right now), and it spits the predicted scores right back at me.

My explanatory variables were a ranking system based on home and away offense and defense. These points were generated by myself, and use the following 7 per-game stats: points, rushing yards, rushing yards per attempt, completion rate, pass yards per attempt, yards per play, and yards gained per point scored. Each was weighted in accordance with its average correlation from 1997-2004 with number of wins per season.

The standard error on the predicted score is 8.5, and the r-squared for the model is about 31%, which isn't great, but isn't terrible either, considering the random nature of sports.

Last edited by Demented Avenger on Wed Oct 19, 2005 10:00 pm, edited 1 time in total.

Heteroskedasticity is non-constant error variance and given the blob nature of your data, it doesn't appear as though its there, so that's a good sign... normally with heteroskedasticity, your standard errors are off, and therefore any confidence you have in the value of Betas might be muted. Why is this important? Well, its only important if you're changing the model specification.... how many explanatory variables are you using? Have you run multiple specifications?

And.. it looks fairly linear to me, although you might try a logarithmic or exponential transformations on a few of your independent variables to see if that does anything. How would you know if this is better?

Run a model with ln(x) --> y and one with x --> y... whichever has the higher R-squared is actually fitting the data better. When you're trying to run a forecast like this, you want to maximize the amount of variance you can explain, so fiddling with these data transformations, while tedious, might give you some leverage.

I've gotta go right now, but I'll write more later... I have a couple of data analysis programs that are a bit more powerful than excel that I could use if you sent me the data.

Take er ez...

Dude

I am the Walrus

The_Dude

General Manager

Posts: 3481

(Past Year: 2)

Joined: 14 Aug 2003

Home Cafe: Football

Location: My ivory tower, where I oversee the intellectual development of America's youth