Holy cow, I wrote a book!
It's that time again: Raymond comes up with an absurd, arbitrary criterion for filling out his NCAA bracket.
This time, I studied all the games played in the NCAA men's basketball tournament since 1985 and computed how many of the games were won by the favorite and how many were upsets, broken down by the numerical difference between the seeding of the two teams.
I found it interesting that when the teams are seeded N and N+2, you get an upset more than half the time!
If the probability of the favorite winning is p and I choose the favorite with probability q, then the prediction would be correct pq + (1−p)(1−q) = (2p−1)q + (1−p) of the time. If you hold p constant, then this is maximized when q = 0 if p < ½, or when q = 1 if p > ½. (If p = ½, then it doesn't matter what you pick for q.)
Therefore, to maximize the number of correct predictions, I should always choose the favorite, unless the two teams are seeded N and N+2, in which case I always choose the underdog. But that makes for a boring bracket. Consequently, I went for the suboptimal algorithm of choosing q = p. Here is the result:
Update: