Study vs. Study: The Decline Effect and Why Scientific 'Truth' So Often Turns Out Wrong

Why it may not be so surprising that study findings don't always hold up.

Jan. 2, 2011 — -- A few years ago, I devoted a Who's Counting article to the work of Dr. John Ioannidis, a medical researcher now at Stanford. Ioannidis examined the evidence in 45 well-publicized health studies from major journals appearing between 1990 and 2003. His conclusion: the results of more than one third of these studies were flatly contradicted or significantly weakened by later work.

The same general idea is discussed in "The Truth Wears Off," an article by Jonah Lehrer that appeared last month in the New Yorker magazine. Lehrer termed the phenomenon the "decline effect," by which he meant the tendency for replication of scientific results to fail -- that is, for the evidence supporting scientific results to seemingly weaken over time, disappear altogether, or even suggest opposite conclusions.

Lehrer and Ioannidis discuss a number of fascinating studies on topics ranging from hormone replacement therapy and vitamin supplements to echinacea and the role of symmetric features in sexual selection.

Rather than cover these once again, however, let me focus here on why the decline effect is not as surprising as it might first appear.

Regression to the Mean

One reason for some of the instances of the decline effect is provided by regression to the mean, the tendency for an extreme value of a random quantity dependent on many variables to be followed by a value closer to the average or mean.

Very intelligent people can be expected to have intelligent offspring, but in general the offspring will not be as intelligent as the parents.

A similar tendency toward the average or mean holds for the children of very short parents, who are likely to be short, but not as short as their parents.

If I throw 20 darts at a target and manage to hit the bull's eye 18 times, the next time I throw 20 darts, I probably won't do as well.

This phenomenon leads to nonsense when people attribute the regression to the mean as the result of something real, rather than to the natural behavior of any randomly varying quantity.

Classic Example: Airplane Pilots' Landings

A classic example is can be seen in work by the psychologists Amos Tversky and Daniel Kahneman. They noted that if a beginning pilot makes a very good landing, it's likely that his next one will not be as impressive. Likewise, if his landing is very bumpy, then by regression to the mean alone, his next one will likely be better.

Not surprisingly, after good landings, the flight instructors praised the pilots, whereas they berated them after bumpy landings. They mistakenly attributed the pilots' deterioration to their praise of them, and likewise the pilots' improvement to their criticism.

The instructors just as well might have attributed the pilots' deterioration to the decline effect. Extreme results, whether positive or negative, are by chance alone generally followed by less extreme ones.

Margins of Error

In some instances, another factor contributing to the decline effect is sample size. It's become common knowledge that polls that survey large groups of people have a smaller margin of error than those that canvass a small number. Not just a poll, but any experiment or measurement that examines a large number of test subjects will have a smaller margin of error than one having fewer subjects.

Not surprisingly, results of experiments and studies with small samples often appear in the literature, and these results frequently suggest that the observed effects are quite large -- at one end or the other of the large margin of error. When researchers attempt to demonstrate the effect on a larger sample of subjects, the margin of error is smaller and so the effect size seems to shrink or decline.

Publication Bias, Other Psychological Foibles

Publication bias is, no doubt, also part of the reason for the decline effect. That is to say that seemingly significant experimental results will be published much more readily than those that suggest no experimental effect or only a small one. People, including journal editors, naturally prefer papers announcing or at least suggesting a dramatic breakthrough to those saying, in effect, "Ehh, nothing much here."

The availability error, the tendency to be unduly influenced by results that, for one reason or another, are more psychologically available to us, is another factor. Results that are especially striking or counterintuitive or consistent with experimenters' pet theories also more likely will result in publication.

Even such a prosaic occurrence as clock-watching provides illustration of this. I don't think I look at the clock more than others do, but I always seem to notice and remember when the time is 12:34, but not when it's 10:56 or 7:41.

Scientists are, of course, subject to the same foibles as everyone else. When reading novels or watching movies, most of us make an attempt to suspend our disbelief to better enjoy a good story. When doing health or other scientific studies, researchers usually try to do the opposite. They attempt to suspend their belief to better test their results.

Alas, they sometimes succumb to a good story and fiddle with the results to preserve its coherence. This was part of the problem in the recent hyped accounts of arsenic-based life.

There's also the problem of poor experimental design and the sometimes unknown confounding variables (even different placebos) whose effects can mask or reverse the suspected effect. The human tendency to exaggerate results and to indulge one's vanity by sticking with the initial exaggeration cannot be dismissed either.

A greater realization of these effects by journalists, scientists, and everyone else will lead to more caution in reporting results, more realistic expectations, and, I would guess, a decline in the decline affect (more accurately, the stat-psych effect).

Getting at the truth has always been hard. Nevertheless, we can take comfort in the fact that, though nature is tricky, she is not out to trick us.

That is probably what Einstein meant when he wrote, "God is subtle, but he is not malicious."

John Allen Paulos, a professor of mathematics at Temple University in Philadelphia, is the author of the best-sellers "Innumeracy" and "A Mathematician Reads the Newspaper," as well as, most recently, "Irreligion." He's on Twitter and his "Who's Counting?" column on ABCNews.com usually appears the first weekend of every month.