In a previous post I mentioned that I wanted to test the reliability of the cumulative connections team ranking method by keeping track of the number of NCAA tournament games in which the higher ranked team actually won the game. I kept track of this for 6 other ranking methods as well (both human and computer generated). Here are the final results:
Gold Medal: AP Poll – 44 games correct, 16 incorrect (73.3% Success)
Silver Medal: Coaches Poll – 43 Games correct, 17 incorrect (71.7% Success)
Bronze Medal: Tie: Cumulative Connections and Jeff Sagarin – 47 games correct, 20 incorrect (70.1% Success)
5th: KenPom – 46 Games correct, 21 incorrect (68.7% Success)
6th: Tournament Seed – 42 Games correct, 20 incorrect (67.7% Success)
7th: ESPN BPI – 45 Games correct, 22 incorrect (67.2% Success)
You will notice that the human ranking methods did not predict as many games as the computer ranking methods. For the human polls this occurred because only the top 30 or so teams (including those teams ‘receiving votes’) were ranked and some games were between two teams that did not receive a single vote. For the tournament seeds some games featured two teams with the same seed and therefore no winner was predicted.
In order to be fair to all the ranking systems we can re-count the sucsess rate while only including those games in which all the ranking methods actually predicted a winner. Here are those results:
Gold Medal: Cumulative Connections – 42 games correct, 15 incorrect (73.7% Success)
Silver Medal: 3-Way Tie: AP Poll, Coaches Poll, Jeff Sagarin – 41 Games correct, 16 incorrect (71.9% Success)
5th: Tie: Tournament Seed, KenPom – 40 Games correct, 17 incorrect (70.2% Success)
7th: ESPN BPI – 39 Games correct, 18 incorrect (68.4% Success)
So in the apples-to-apples comparison, the cumulative connections ranking system actually preformed the best.
It is interesting to note how closely the successes rates for the different methods matched each other. This was largely due to the fact that these systems produced very similar rankings. In fact, all 7 ranking systems predicted the same winner in 50 out of 67 tournament games (75%). This means that the difference in success rates between the methodologies depended on the outcomes of only 17 games.
It is rather amazing that these 7 methodologies produced rankings that were so similar. The 4 computer rankings used completely different methodologies from each other to arrive at rankings that were not only largely consistent with each other but also consistant with the opinions of sports writers and coaches.
Wes Colley reflected on this when he compared his NCAA football computer rankings to human polls and I think he sums up this semi-miracle nicely:
The press polls started with a pre-season poll, with all the pre-conceived notions of history and tradition such an endeavor demands, then week by week allowed their opinions and judgments to migrate, being duly impressed or disappointed in the styles of winning and losing by certain teams, being more concerned about recent games than earlier ones, perhaps mentally weighting games seen on television as more important, perhaps having biases (good or bad) toward local schools one sees more often… ad nauseam.
My computer rankings started with nothing, literally no information, but then, given only wins and losses, generated a ranking with pure algebra.
That two such processes produce even remotely consistent results is, frankly, remarkable to me.