|
|
|
|
| View previous topic :: View next topic |
| Author |
Message |
mith
Pitbull of Truth
|
Posted: Fri Oct 01, 2004 2:20 pm Post subject: 1 |
|
|
In January there was a thread (which I can't find now) which discussed the BCS system. There was some interest (mostly by me, of course, but anyway...) in creating a new ranking system, just for fun.
I thought I'd start a thread over here to discuss how we might go about it. At the very least, I'm wondering if anyone is interested in doing some programming for me (as I don't have anything with which to do it myself at the moment)... perhaps if there's interest, several of us can design our own systems, get the programmed, and then see how they each compare to what we might expect.
Currently I'm looking at various ways that are currently being used, dividing into "data" and "method". For example:
Data:
Is the score used, or is it purely W/L?
Do we care about the location?
Do we care about the date?
Do we care about the stats?
Methods:
Colley Matrix - based only one W/L, but could possibly be adjusted to take other things into account. Probably the best purely mathematical scheme I've seen. Basically, it takes an intuitive method of assigning rankings (the winning teams gains some and the losing team loses some), makes it iterative, and then converts the resulting 117 equations 117 unknowns into a big matrix.
Mathematical Monkey Rankings - This one is amusing, but gives reasonable results. Basically, a "monkey" voter starts on a random team, randomly picks a game involving that team, and then chooses to stick with the current team or switch to the other team based on a weighted coin flip; that is, it's more likely for the money to end up with the team that won, but there's a chance that the monkey will consider the game an upset. Obviously the more wins you have, the better off you are, and it takes SoS into account by the fact that if you play better opponents, you will have more monkeys looking at your games and possibly switching to you even if you lost. Like the Colley Matrix, only looks at W/L, but could again be adjusted so that it looked at more (by making the weighting a bit more complicated, probably).
Penalized Maximum Likelihood - Basically, each team is given a rating, and the "likelihood" of the results is calculated based on these ratings. If the winning team has a much higher ranking than the losing team, the likelihood of that game will be high, whereas if the losing team has the higher ranking, the likelihood will be low. The ratings are adjusted to find the best rating list in terms of the overall likelihood. The "penalized" part is an extra team thrown in that has played everyone twice and split the games; this is simply to prevent an undefeated team from having a rating of infinity (or a team with no wins having a rating of negative infinity).
A purely retrodictive ranking - The ideal here is that you rank the teams in such a way that the highest percentage of games is "correct"; that is, the winning team has the higher rank. Since you can't actually look at every possible (117!, and that's if you don't allow ties) ranking, and since ties are quite possible (unless the "best" ranking is such that 1 beat 2 beat 3 beat 4..., then you can get an equally good ranking by swapping any two consecutive teams that played each other), an actual method would have to be worked out. I have a few thoughts on this, but nothing solid.
"Best chain" - This is an idea I thought up that's rather similar to the monkey system, in that it's a bit strange but should probably give reasonable results. Each pair of teams would be compared, looking for the best "chain" between them. All I mean by a chain is a series of victories leading from one team to the other. So for example, if USC beat Auburn who beat Tennessee who beat Florida who beat LSU, then USC would have a chain of 4 over LSU. The best I can do for LSU over USC is 5 (LSU beat OU beat Texas beat Kansas St. beat Cal beat USC), so, if that is indeed the best, USC would get the "win" over LSU. The ranking would simply be based on how many of these "wins" a team got. Ties could be broken by comparing among the teams that are tied. For example, if USC and LSU were tied, USC would be ranked ahead because of the head-to-head "win".
I expect the system would give reasonable results because obviously if you have less losses it's going to be harder for a team to have a short chain to you, and if your opposition is of better quality, the ones you have beat are likely to have beaten other good teams, and the ones you have lost to aren't likely to have lost as many themselves.
Anyway, just a few ideas I've been playing around with in my head, thought I'd see if anyone is interested in this sort of thing. This doesn't have to apply to sports, even (though it's easy to get data to work with from sports), so I'm hoping there will be some more general interest than just from football fans.  |
|
| Back to top |
|
 |
Dread Pirate Westley
Daedalian Member
|
Posted: Fri Oct 01, 2004 10:29 pm Post subject: 2 |
|
|
| Quote: |
Is the score used, or is it purely W/L?
Do we care about the location?
Do we care about the date?
Do we care about the stats? |
1.) Yes, but not linearly. There are basically three kinds of results:
Close games. If a supposedly good team barely sqeaks by a mediocre one on a last second field goal, I think that's important.
Solid wins: Anything from about 7-17 points. You've asserted your superiority over the other team.
Blowouts: I really don't care if you can put up that 9th touchdown against Northwestern. We've got it. They don't belong on the same field.
I'm thinking the "incremental bonus" in margin of victory should be roughly a bell curve. Win by 1 and you get a point. Win by 3 and you get 1.002 points. Win by 4, 5, 6, etc. you start seeing significant increases up until around 20 when each extra point you tack on becomes worth another 0.001 points in the standings (Assuming more points is good.)
2.) Yes, but not as much as 1 and 4. I think there should be some kind of premium for winning on the road, but it should again be conditional. If Michigan beats Ohio State in Columbus, that's different than if they play a non-conference game at Clemson.
3.) No.
4.) Yes, but there's too many to effectively set up a system. I would say turnover ratio, total yards, and maybe some kind of yards per play stat (one for each offense and defense) would be most important. |
|
| Back to top |
|
 |
Blighty Chap
Guest
|
Posted: Fri Oct 01, 2004 11:21 pm Post subject: 3 |
|
|
I used to agree with mith, that was until the fantasy baseball took a damn fine turn in my favour
Now I agree with DPW. |
|
| Back to top |
|
 |
mith
Pitbull of Truth
|
Posted: Sat Oct 02, 2004 12:07 pm Post subject: 4 |
|
|
1. The problem, of course, is that doing things like that makes it somewhat arbitrary and (potentially) biased. Have to be quite careful with such things. A two point win isn't necessarily more impressive than a one point win; and how the win comes about is as important as the actual margin. I'd say it's more impressive to beat a team 7-6 if than scored two field goals and then you drove down the field to score the winning touchdown, rather than if they missed the extra point that would've tied it (for example). How does one quantify things like that, though? And how much of a bonus is a "solid" win worth compared to a close game? It can't be too much, because a win is still a win.
2. So should conference games have more weight? That's what that seems to imply (sorta).
One thing that was suggested in the Mease paper was that rather than having a single "home field advantage" variable, each team have one. It's obviously much more of an advantage for Florida to play at the Swamp or A&M at Kyle Field than for Clemson at wherever they play. Of course, then you have twice as many variables on the same number of games, and so they're a bit more prone to uncertainty.
3. I suppose it depends on the goal of the ranking. If it's for, for example,. getting the best bowl games possible, then you have to include it. You want the team that is the best at the end of the season. Teams strengths definitely change over the course of the season. I would tend to agree with you, the season should be judged as a whole; in the rough sketch of a plan I have right now, I don't include it.
4. Definitely agree on that. I have yet to think up a fair way to do this. If a system that used stats was actually used for the BCS, teams would have to play their first team all the time; in order to use them fairly, you'd have to have some way to say "ok, this game is over now, we'll ignore stats from now on". But that gets a bit arbitrary too, when is a game actually over? (Ever hear of John Tyler vs. Plano East, 1994?)
More later, the limey chick wants me to get off her broadband.  |
|
| Back to top |
|
 |
Dread Pirate Westley
Daedalian Member
|
Posted: Sun Oct 03, 2004 2:24 pm Post subject: 5 |
|
|
| mith wrote: |
| 1. The problem, of course, is that doing things like that makes it somewhat arbitrary and (potentially) biased. Have to be quite careful with such things. A two point win isn't necessarily more impressive than a one point win; and how the win comes about is as important as the actual margin. I'd say it's more impressive to beat a team 7-6 if than scored two field goals and then you drove down the field to score the winning touchdown, rather than if they missed the extra point that would've tied it (for example). How does one quantify things like that, though? And how much of a bonus is a "solid" win worth compared to a close game? It can't be too much, because a win is still a win. |
Maybe even a tiered system would work. A first, low bonus tier for if you win by a low number of points; win by 1, win by three, it doesn't matter. A solid bonus for winning by a TD or two, and a slightly larger bonus for winning by more than that. I'd say its still more impressive to win 7-6 by scoring a touchdown early, then holding the opposition to two field goals the remainder of the game, but no matter which of the three scenarios, you were largely ineffective for most of the day, but managed to string together one good drive (Probably because the other team muffed a punt or something so you got great field position). Close games like that are usually decided by some small thing and there's more room for luck to get involved (if the ball bounced a different direction).
| mith wrote: |
2. So should conference games have more weight? That's what that seems to imply (sorta).
One thing that was suggested in the Mease paper was that rather than having a single "home field advantage" variable, each team have one. It's obviously much more of an advantage for Florida to play at the Swamp or A&M at Kyle Field than for Clemson at wherever they play. Of course, then you have twice as many variables on the same number of games, and so they're a bit more prone to uncertainty. |
Not necessarily. Iowa's biggest rivalry is with Iowa State (Big Ten vs. Big XII).
I'll admit I didn't have time to read any of your links (and still don't). How about something like this? Assume the home team should win each game (For purposes of this rating. It may not be so true for Ohio State at Northwestern...ooh, bad example, that. ). If the visiting team wins, give them something like 1 point for every x number of fans in attendance. You could assume about 10% support the visitor and effectively "cancel out" a like number of home fans, so use 80%. This does require a lot of digging through box scores, however. |
|
| Back to top |
|
 |
mith
Pitbull of Truth
|
Posted: Fri Oct 08, 2004 2:31 pm Post subject: 6 |
|
|
| I've got a C compiler now, so I can do some programming. Might try to program the colley matrix again or something if I have time this afternoon. |
|
| Back to top |
|
 |
mith
Pitbull of Truth
|
Posted: Sat Nov 27, 2004 10:54 am Post subject: 7 |
|
|
Well, not a lot of interest here it seems.
I'm still working on various things. I came up with an algorithm for running the colley matrix a couple years ago, so that's just a matter of coding it. My own personal twist would be that I would adjust for OT wins and include the 3 1AA teams that play roughly 1A schedules. I think a rough attempt at home field is doable as well, simply by having separate ratings for home and away for each team (Or assuming home field is always the same, but there's pros and cons for both).
Same for the monkey rankings. There's a way to convert the algorithm to a matrix, so it's basically the same program as colley, just different start values.
Margin is a bit tricky, but I think the easiest way to include it in either would be a tier system. For example, in the monkey system, there's a value that determines how often the winning team is picked. Simply change it so that there's a few values; say 95% if it's a blowout, 80% if it's solid, 65% if it's close, 60% if it's an OT win, 55% if it's 2OT or more or a flukey win.
What I think I'll end up going for is a genetic algorithm of sorts. Get together the past X years of data, run the rating for all games before december (or before a particular week, anyway), and then see how it does predicting the rest of the season. And likewise with retrodictive.
Basically the goal is to minimize the ranking violations (lower team beats higher) retrodictively, but at the same time maximize the predictive ability. I know there *exists* a method for minimizing ranking violations (Coleman's MinV), but it's quite arbitrary since there's so many different possibily rankings that are minimal there.
Anyway, current progress:
Algorithms for Colley and Monkey W/L. For the p in Monkey, the most logical choice would seem to be the minimum violation percentage (currently the best is 7.34%, so use p=.9268).
Need to figure out how to incorporate home/away. Simplest number to use here would be the actual percentage of home wins.
Need criteria for tiers. Simply using the margin isn't good enough, I don't think. A 9-0 win could be more dominant in my opinion than a 59-42 win. And a 1 point win could be less decisive than an OT win. I'm just trying to figure out how to sort the games into categories in an unbiased way (obviously we could go through each game individually, but it'd be likely to becomes inconsistent if we didn't have a set algorithm for it).
I think I have an algorithm for the best chain method as well, and I *think* I could modify it for margin at least. Not sure about home field.
The maximum likelihood and retrodictive methods (which are similar, but not the same as I currently have the latter planned) are going to be completely different to code, and I haven't figured that out yet.
What I really need at the moment is data in a form I can use it. Perhaps if we can get several people to take a conference or two each and record the games? Volunteers? |
|
| Back to top |
|
 |
CrystyB
Misunderstood Guy
|
Posted: Sat Jan 01, 2005 8:49 am Post subject: 8 |
|
|
| Quote: |
| Need criteria for tiers. Simply using the margin isn't good enough, I don't think. A 9-0 win could be more dominant in my opinion than a 59-42 win. |
I think it might be worth trying also using the percentage of margin: 100*margin/high (100% in the former case, 28.8% in the latter).
| Quote: |
| And a 1 point win could be less decisive than an OT win. |
Umm, what's OT?
| Quote: |
| What I really need at the moment is data in a form I can use it. Perhaps if we can get several people to take a conference or two each and record the games? Volunteers? |
I think i feel like volunteering. Where's the data i would need to go through? |
|
| Back to top |
|
 |
mith
Pitbull of Truth
|
Posted: Sat Jan 01, 2005 3:17 pm Post subject: 9 |
|
|
| Quote: |
| I think it might be worth trying also using the percentage of margin: 100*margin/high (100% in the former case, 28.8% in the latter). |
Not sure that's any better. A 3-0 win is certainly not more dominating than a 70-3 win, but percentage would show it as such.
Overtime. If they're tied at the end of regulation, they go into overtime; each team gets the ball once at the 25. If they're tied at the end of OT, they go into a second, and so on.
| Quote: |
| I think i feel like volunteering. Where's the data i would need to go through? |
I think I've found the scores, anyway; I just need to remind myself how to program to go through it. If we're going to do stats or whatever though, we'll probably have to go through game by game; would rather figure out whether we need that first.  |
|
| Back to top |
|
 |
CrystyB
Misunderstood Guy
|
Posted: Sat Jan 01, 2005 11:57 pm Post subject: 10 |
|
|
| mith wrote: |
| Quote: |
| I think it might be worth trying also using the percentage of margin: 100*margin/high (100% in the former case, 28.8% in the latter). |
Not sure that's any better. A 3-0 win is certainly not more dominating than a 70-3 win, but percentage would show it as such. |
Note my emphasis: i meant somehow using both value and percentage. |
|
| Back to top |
|
 |
mith
Pitbull of Truth
|
Posted: Sun Jan 02, 2005 3:27 am Post subject: 11 |
|
|
| Oh, sure. |
|
| Back to top |
|
 |
casinopete
Emergency Backup Antrax
|
Posted: Sun Jan 02, 2005 6:37 pm Post subject: 12 |
|
|
Perhaps you could double-tier it somehow to hit the best points of percentage and margin of victory?
Say the 3-0 win scores 1/1 for being a slight victory with low overall scoring, and the 70-3 win gets 12/4 for being a huge victory, but in a relatively high scoring game. 73-70 would then score 1/3. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|