June 5, 2005 -- -- Looking at large data sets and deriving loud conclusions from the reams of whispering numbers is often enjoyable. Herein are three quite disparate examples.
The first concerns sumo wrestlers and comes from "Freakonomics," a fascinating new book by economist Steven Levitt and writer Stephen Dubner, that employs Levitt's quirky economic insights to illuminate many everyday activities and practices. The second is simply a study I reported on in a book I wrote on the stock market, and the third comes from a simple analysis I recently made of grade distributions for a required math course at my university.
As Levitt's analysis makes clear, Sumo wrestling is a sport in which the best practitioners earn big bucks and the journeymen earn far, far less. Among the topmost tier of wrestlers, those who have a winning percentage in a set of 15 crucial tournament matches qualify for the larger pay and numerous other perks. Drawing on an extensive data set of such contests, Levitt focuses on the outcomes of certain pivotal matches in which sumo wrestlers in the elite category have a 7-7 record (and thus are on the cusp of a winning 8-7 winning percentage). In particular, they check to see what happens when these wrestlers face those with an 8-6 record in the tournament. The latter have already made the cut-off, but have no chance for the top prizes.
The outcomes of these crucial matches strongly suggest collusion. Much more often than one would expect, the matches result in the 7-7 wrestlers beating the 8-6 wrestlers. In fact, relying on the record of past matches between the wrestlers, one might reasonably estimate that the chances 7-7 wrestlers would beat 8-6 wrestlers to be slightly under 50 percent, rather than the 80 percent rate at which they do win in the crucial matches.
Furthermore, as Levitt stresses, when these same pairs of wrestlers meet in a subsequent tournament match that is not crucial, the wrestlers who had the 7-7 records in the previous tournament win only 40 percent of the time against the wrestlers with the 8-6 records in the previous tournament. That is, they win 80 percent of the time when they have much to gain and the other wrestler has little to lose and then win only 40 percent of their next matches. This and other evidence gleaned from the voluminous records and frequent matches among the wrestlers in the top tier strongly suggest that collusion among the wrestlers is not rare.
Another example of unlikely results in crucial cases involves the quarterly earnings that companies announce. Will they meet the estimates analysts have established for them? When companies' earnings fall short by a penny or two per share, investors sometimes react as if this were tantamount to near-bankruptcy, and when they exceed them investors are often inordinately pleased.
Perhaps not surprisingly, studies (again of large data sets) in the 1990s showed that companies' earnings were much more likely to come in a penny or two above analysts' average estimate than a penny or two below it. If earnings were figured without regard to analysts' expectations, they'd come in below the average estimate as often as above it. The reason for the asymmetry is probably that some companies "backed in" to their earnings. Instead of determining revenues and expenses and subtracting the latter from the former to obtain earnings (or more complicated variants of this), companies began with the earnings they needed and adjusted revenues and expenses to achieve them.
From sumo wrestlers to corporate accountants to (moving closer to my home) university mathematics professors, the tendency to hedge matters in certain crucial situations is almost universal, albeit in this last example rather benign and selfless. Once again, large data sets reveal their secrets when probed with the right questions.
Like many universities, mine requires that a core mathematics course be taken by all students who do not plan on going on in mathematics or the sciences. To pass this particular course, a student is required to do reasonably well and score a C- or higher. Suspecting that the number of C-'s would be much larger than the number of D+'s because of how crucial this small difference is to students, I decided to examine the number of C-'s and D+'s given in this course over the last four years.
In any sort of roughly normal distribution of grades there should not be any sharp divide between the frequencies of the two grades. But for the period and course in question, there were approximately 800 C-'s and 100 D+'s awarded. One might argue that the number of D+'s should be lower than the number of C-'s simply as a result of a normal distribution with an average of C or more, but this drop-off was precipitous, fully eight times as many C-'s as D+'s. (The 400 or so plain D's and roughly 700 F's given out during this period indicate that general grade inflation is not the issue.)
It seems that, at the crucial point between C- and D+, the faculty were likely to give the students in this course a bit of a break. A genuine uncertainty is the likely motivation. Assigning grades is not a cut-and-dried activity, and many professors apparently preferred to give students the benefit of a doubt in these close calls rather than adhere rigidly to standards that are inevitably slightly arbitrary.
Note, finally, that in all three of these examples scrutinizing critical borderline cases leads to the observation in question.
-- Professor of mathematics at Temple University, John Allen Paulos is the author of best-selling books, including "Innumeracy" and "A Mathematician Plays the Stock Market." His "Who's Counting?" column on ABCNEWS.com appears the first weekend of every month.