A few years ago, I devoted a Who's Counting article to the work of Dr. John Ioannidis, a medical researcher now at Stanford. Ioannidis examined the evidence in 45 well-publicized health studies from major journals appearing between 1990 and 2003. His conclusion: the results of more than one third of these studies were flatly contradicted or significantly weakened by later work.
The same general idea is discussed in "The Truth Wears Off," an article by Jonah Lehrer that appeared last month in the New Yorker magazine. Lehrer termed the phenomenon the "decline effect," by which he meant the tendency for replication of scientific results to fail -- that is, for the evidence supporting scientific results to seemingly weaken over time, disappear altogether, or even suggest opposite conclusions.
Lehrer and Ioannidis discuss a number of fascinating studies on topics ranging from hormone replacement therapy and vitamin supplements to echinacea and the role of symmetric features in sexual selection.
Rather than cover these once again, however, let me focus here on why the decline effect is not as surprising as it might first appear.
One reason for some of the instances of the decline effect is provided by regression to the mean, the tendency for an extreme value of a random quantity dependent on many variables to be followed by a value closer to the average or mean.
Very intelligent people can be expected to have intelligent offspring, but in general the offspring will not be as intelligent as the parents.
A similar tendency toward the average or mean holds for the children of very short parents, who are likely to be short, but not as short as their parents.
If I throw 20 darts at a target and manage to hit the bull's eye 18 times, the next time I throw 20 darts, I probably won't do as well.
This phenomenon leads to nonsense when people attribute the regression to the mean as the result of something real, rather than to the natural behavior of any randomly varying quantity.
A classic example is can be seen in work by the psychologists Amos Tversky and Daniel Kahneman. They noted that if a beginning pilot makes a very good landing, it's likely that his next one will not be as impressive. Likewise, if his landing is very bumpy, then by regression to the mean alone, his next one will likely be better.
Not surprisingly, after good landings, the flight instructors praised the pilots, whereas they berated them after bumpy landings. They mistakenly attributed the pilots' deterioration to their praise of them, and likewise the pilots' improvement to their criticism.
The instructors just as well might have attributed the pilots' deterioration to the decline effect. Extreme results, whether positive or negative, are by chance alone generally followed by less extreme ones.
In some instances, another factor contributing to the decline effect is sample size. It's become common knowledge that polls that survey large groups of people have a smaller margin of error than those that canvass a small number. Not just a poll, but any experiment or measurement that examines a large number of test subjects will have a smaller margin of error than one having fewer subjects.