A new study led by Stanford University researchers raises doubts about the accuracy of one of the most common forms of survey research, polls done among people who sign up to fill in questionnaires via the internet in exchange for cash and gifts.
In the most extensive such analysis to date, David Yeager and Prof. Jon Krosnick compared seven non-random internet surveys with two surveys based instead on random or so-called probability samples. The non-probability internet surveys were less accurate, and customary adjustments did not uniformly improve them.
While the random-sample surveys were “consistently highly accurate,” the internet surveys based on self-selected or “opt-in” panels “were always less accurate, on average, than probability sample surveys, and were less consistent in their level of accuracy,” the researchers said. Further, they said, adjusting these samples to known population values had no effect on accuracy (and in one case even worsened it) as often as that process, known as weighting, improved it.
The inconsistency is a challenge because it means the accuracy of one measure from an opt-in panel survey can’t reliably be taken to mean that other measures are accurate, the researchers noted. And there are other problems: While results of one of the seven opt-in online panels was “strikingly and unusually inaccurate,” they said, “the rest were roughly equivalently inaccurate,” suggesting that no one or another of these producers can claim to have perfected the approach.
The study, which builds on earlier work by Krosnick and his graduate students, is posted here, with a technical summary here. (Full disclosure: Krosnick’s a friend and colleague with whom I’ve collaborated on other research projects, and I offered comments on a draft of this paper.)
The results shouldn’t be a surprise; as I’ve reported previously (e.g., here and here), non-probability samples lack the theoretical underpinning on which valid and reliable survey research is based; our policy at ABC News, as at several other national news organizations (including The Associated Press, The New York Times and The Washington Post) is not to report them. I welcome any coherent theoretical defense of the use of convenience samples in estimating population values; it's a debate we need to have.
Whatever they are, these surveys do represent an enormous business, used particularly heavily in market research. The trade magazine Inside Research has estimated that spending on online market research will reach $2.05 billion in the United States and $4.45 billion globally this year, and that data gathered online will account for nearly half of all survey research spending.
Yesterday I asked Laurence Gold, the magazine’s editor and publisher, what the industry is thinking about in its use of non-probability samples. “The industry is thinking fast and cheap,” he said.
The validity and reliability of such research is an open question, Gold noted; market researchers have tried to gauge it by seeing if results can be replicated across surveys, with uneven results. There’s been a push recently, he added, to try to improve data quality.
Gold also noted that difficulties getting a high response rate to random-sample surveys is an argument used in favor of non-probability samples. But Yeager, Krosnick and their co-authors take up that point, noting that the probability samples in their study were more accurate despite less than optimal response rates (as a great deal of other research also has shown), while the opt-in internet panels were less accurate regardless of their response rates (which mostly were as low or lower).
Yeager and Krosnick compared the surveys against benchmark measures from high-quality federal government data, using demographic parameters (age, sex, race, education, region, income, marital and employment status, size of household and the like) and non-demographics such as frequency of smoking and drinking, and possession of a passport or driver’s license. Average absolute error without weighting was 3.5 percentage points for one probability sample (telephone) and 3.3 points for the other (a probability-based internet panel), vs. anywhere from 4.9 to 9.9 points for the opt-in online panels.
Differences persisted in weighted data; the highest single error was twice as high in one of the non-probability samples (17.8 points off the benchmark) as it was in either of the probability samples (9 points off).
Largest absolute errors also were larger for the non-probability samples. And in another measure, the weighted probability samples were significantly different from the benchmarks 31 percent of the time for the telephone survey and 46 percent of the time for the probability-based internet panel, vs. anywhere from 62 to 77 percent of the time for the opt-in online panels.
While this paper is the first to evaluate the subject in such detail, intimations of these problems were posted in a blog item this summer by Reg Baker, COO of the research firm Market Strategies International. Estimates of smoking prevalence were similar in three probability samples, he reported, but less similar – with variation of as many as 14 points – in 17 opt-in online panels. In such panels, he said, “the results we get for any given study are highly dependent (and mostly unpredictable) on the panel we use. This is not good news.”
Yeager and Krosnick, meanwhile, provide one more eye-opener: The average highest weight for any one respondent across the opt-in online samples was 30 – one respondent, that is, standing for the equivalent of 30 in the full dataset. (And one went as high as 70.) The highest weights in the two probability samples, by contrast, were 5 and 8.
Non-probability research often is done to assess relationships between variables – but not to measure the magnitude of such associations, much less population values, such as how many people think or do X, Y or Z. If that is a researcher’s aim, Yeager and Krosnick say, “non-probability sample surveys appear to be considerably less suited to that goal than probability sample surveys.”