NFL Statistical Analysis

Thread Starter

PRS

Joined Aug 24, 2008
989
I took up statistics this year in order to write a computer program to predict the outcomes of football games. To start with I downloaded last year's data and looked it over for any patterns that might suggest possible algorithms with which to make a mathematical prediction.

I noticed offensive ranking predicted New Orleans to be the best team in football. Minnesota was right up there, so was Green Bay to name the top of the list. So I took the offensive ranking (1-32) and added it to 1/2*Defensive ranking and compared the matched up teams. Also, I subtracted 5 for home field advantage.

Quality = offense + 1/2 defense - 5 if home team. The lower the score, the higher the quality and these quality factors were used to predict outcomes of particular match-ups.

After 3 games my program is only 55% correct. I think I could do as well just by guessing. It's back to the drawing board. ;)

Total yards seems to be the best way to rank both offenses and defenses. But there's a trick to it: Scoring 30 points against the Seahawks is not the same as scoring 30 points against the Packers. Thus there needs to be a "weighted" total yards.

This is just one thought. I have others. Does anyone else have any thoughts on this? Another is this: The Seahawks should have a heavier weight for home field advantage than the Chargers because the home-away stats for each team.
 

Papabravo

Joined Feb 24, 2006
21,159
You are absolutely wasting your time. There is little or no evidence to suggest that ANY past measure of team performance has ANY predictive power whatsoever.

A team is a dynamic entity with stochastic performance, and it is never the same team twice.
 

jpanhalt

Joined Jan 18, 2008
11,087
Look up Stein's Paradox. There was a great Scientific American (S.A.) article about it, maybe 3 decades ago or more. I just checked Google and there seem to be several early hits with a similar theme as the S.A. article.

The S.A. article basically begins with the proposition that baseball game outcomes seem random, but in any one year, you know which team to bet on. The same can be said about football, but we all know not to bet on the Cleveland Browns if they are playing a team with a winning record. The Stein estimator helps predict that bit of common sense knowledge.

I have moved a couple of times since I last referenced the article, so I cannot find it right now (i.e., three moves = one fire, Mark Twain). If I find it, I will try to post a link or pdf.

John
 

Thread Starter

PRS

Joined Aug 24, 2008
989
You are absolutely wasting your time. There is little or no evidence to suggest that ANY past measure of team performance has ANY predictive power whatsoever.

A team is a dynamic entity with stochastic performance, and it is never the same team twice.
I'll admit last year's data does not predict this year's outcomes very well. My experience just showed me that. However, if a team was good or bad last year, chances are they will be pretty much the same this year. And this was my reason for using last year's data.

But the low score -- about 55% overall -- that I have earned with my elementary algorithm, only tells me I went about crunching the data in a fallacious way. I intend to improve the algorithm for this coming week's games by using total yards data for both offenses and defenses, and including special teams data.

My interest is really in statistics. Since there are so many variables in sports, I chose to use sports as a means to checking my progress in learning statistical analysis. There are many famous mathematicians who have done the same and with good reason: Games are statistical and therefore a mathematician should be able to make money with his or her expertise in math! ;)
 

Thread Starter

PRS

Joined Aug 24, 2008
989
Look up Stein's Paradox. There was a great Scientific American (S.A.) article about it, maybe 3 decades ago or more. I just checked Google and there seem to be several early hits with a similar theme as the S.A. article.

The S.A. article basically begins with the proposition that baseball game outcomes seem random, but in any one year, you know which team to bet on. The same can be said about football, but we all know not to bet on the Cleveland Browns if they are playing a team with a winning record. The Stein estimator helps predict that bit of common sense knowledge.

I have moved a couple of times since I last referenced the article, so I cannot find it right now (i.e., three moves = one fire, Mark Twain). If I find it, I will try to post a link or pdf.

John
thank's Jpanhalt. I hope you find what you have alluded to. It sounds pertinent. By the way, do you have any ideas on how to weigh the difference between scoring 30 points against Seattle vs scoring 30 points against Green Bay? Not all yardage should be given the same significance, I think.
 

Thread Starter

PRS

Joined Aug 24, 2008
989
I'm just having fun in my own way. I want to learn statistics and test my knowledge at the same time. If I can't predict game outcomes then I'm not really understanding statistics. Right now I'm a D student and I want to improve upon that.
 

Georacer

Joined Nov 25, 2009
5,182
Why don't you weigh the scored yards according to the rolling average score of a team on the final classification, over the years you have a logg of?
 

Thread Starter

PRS

Joined Aug 24, 2008
989
Why don't you weigh the scored yards according to the rolling average score of a team on the final classification, over the years you have a logg of?
Do you care to explain that to me a little better, Goeracer? What is a rolling average?
 

Georacer

Joined Nov 25, 2009
5,182
Well, at first thought you could evaluate the value of a team by finding an average of its standings on the final classification of the championship. But as a team's dynamic changes over the years, that would be misleading. Instead, if you admit that a team can do good at one year and not totaly such the next you can think as follows. Start from year X and calculate the average value as: \(V_{\small{avg\ n}}=0.8 \cdot V_{\small{avg\ n-1}} + 0.2 \cdot V_n\), where V is the counted variable, make it score, standing or whatever. The idea is to take into account older performance but as time goes by let the newer data take over. Tweek the 0.8 and 0.2 constants for more gravity to the past or the recent scores. More "0.8" for gravity to the past or more "0.2" to emphasise the recent performance. Don't forget that the sum must always be 1!
 

jpanhalt

Joined Jan 18, 2008
11,087
Here's the citation: B. Efron and C. Morris, Stein's Paradox in Statistics, Sci. Amer. 236(5):119-127, 1977. That's the May issue. Attached is a pdf of the first page in lieu of an abstract. Scientific American does not have that issue available for sale, but in my experience dealing with the publisher, it takes copyright pretty seriously. I suspect libraries in your area will have bound or other versions for you to view.

John

View attachment 23069
 

Thread Starter

PRS

Joined Aug 24, 2008
989
Well, at first thought you could evaluate the value of a team by finding an average of its standings on the final classification of the championship. But as a team's dynamic changes over the years, that would be misleading. Instead, if you admit that a team can do good at one year and not totaly such the next you can think as follows. Start from year X and calculate the average value as: \(V_{\small{avg\ n}}=0.8 \cdot V_{\small{avg\ n-1}} + 0.2 \cdot V_n\), where V is the counted variable, make it score, standing or whatever. The idea is to take into account older performance but as time goes by let the newer data take over. Tweek the 0.8 and 0.2 constants for more gravity to the past or the recent scores. More "0.8" for gravity to the past or more "0.2" to emphasise the recent performance. Don't forget that the sum must always be 1!
I get the idea. I think it's just what I wanted. Thanks again. I'll "the" old data for 2 or three years and include this year's with the greatest weight. When I do, I'll let you know what happened.
 

Thread Starter

PRS

Joined Aug 24, 2008
989
Thanks, jpanhalt. I googled Charles Stein and found the material Efron and Morris used. I'll have to study it before I can use it. When I do, I'll post results.
 

sceadwian

Joined Jun 1, 2009
499
You have to understand risk factors as well though, injuries, top athletes really put themselves out there, top performance means top risk, a single injury can flush all your existing data down the toilet if it's risk factors aren't taken into account. You can't look at a whole team in a game such as American football because single players can have dramatic effects if there's a total team synergy, it's impossible to statistically analyze this currently, but living systems that complex can't be fully modeled reliably though I'm sure you can get close enough to increase your odds, the question is can you increase your odds to the point where you can make any money.
Oh yeah, and betting on the NFL is illegal so if you ever did find such a system you'd end up in jail if you couldn't hide it.
 

loosewire

Joined Apr 25, 2008
1,686
The bookies has better win, loss record In all the sports.
An exciting close game Is a bookie win but fans love sports.
Business has now found a chance model that Is legal,plus the
logo you wear make a lot difference,you are owned by your school
for life,If you are a good fan.You notice that you don't see home
video of professional sports.The teams own It all. Your kids education
money gos to sports not the class room.
 

sceadwian

Joined Jun 1, 2009
499
Bill, look up the book odds of being able to determine the Superbowl winner at the end of the year at the start of the season =)

Yes, they do know their buisness, and they'll never explain their statistical analysis methods =)
 

sceadwian

Joined Jun 1, 2009
499
Business has now found a chance model that Is legal,plus the
Last time I checked betting on sports was still illegal by large.
I would love to see a substantiation that education monies go to sports rather than to a class room.
 

Wendy

Joined Mar 24, 2008
23,415
Bill, look up the book odds of being able to determine the Superbowl winner at the end of the year at the start of the season =)

Yes, they do know their buisness, and they'll never explain their statistical analysis methods =)
Never said they did, I was responding to a blanket statement. There are a lot of things that are legal in Las Vegas that drive their neighbors nuts. IMO, this is a good thing, morality should not be legislated. But I digress.

Anyone wanting to actually test a theory does have the venue to do so.
 

sceadwian

Joined Jun 1, 2009
499
I think basically what I was trying to say is that chaotic systems aren't practical to model statistically on short timescales, and short timescales is relative to the system itself. Sure a good system will increase your odds over base but it's not possible to account for everything, keeping in mind these are human beings not chess pieces. No analytical theory will ever be able to accurately produce good results from such a chaotic system.

It's like trying to predict the weather, even with huge super computers hell bent on figuring this stuff out we can at best predict trends, then you get your Katrina's
 
Top