A bunch of respectable news outlets yesterday published reports on an academic survey claiming to measure “video-game addiction” among youngsters. “This study’s primary strength,” the author reported, “is that it is nationally representative within 3%.”
I beg to differ.
I’ve got other complaints with this particular study, but most fundamental is its claim of representativeness, as reflected in a margin of sampling error for the data. (Yet more explicitly, the study says: “The sample size yielded results accurate to +/-3% with a 95% confidence interval.”)
The problem: This study was conducted among members of an opt-in online panel – individuals who sign up to click through questionnaires on the internet in exchange for points redeemable for cash and gifts. There are multiple methodological challenges with these things (I’ve previously discussed some) but the most basic – and I think least arguable – is that they’re based on a self-selected “convenience sample,” rather than a probability sample. And you need a probability sample to compute sampling error.
The claim of sampling error for an opt-in online survey is not new; in a presentation on data quality at Harvard last week I presented a list of more than a dozen opt-in online surveys for which such claims have been made, and doubtless there are many more. Indeed, to its shame, the Institute of Politics at Harvard’s own Kennedy School made just such claims itself in 2006 and 2007 polls alike.
The reason seems clear: Good data speak with authority, and a probability sample is a hallmark of good data – a sign that it lives within the framework of inferential statistics. By claiming sampling error, samples taken outside that framework try to nose their way out of the yard and into the house.
They don’t belong there. I have yet to hear any reasonable theoretical justification for the calculation of sampling error with a convenience sample.
Got one? Hit me.
The study in question was written by Douglas Gentile, an Iowa State University professor, and is in press at the journal Psychological Science. That does not help at all; as was noted at that Harvard conference, it’s a sad truth that what academic journal editors don’t know about the basic niceties of sampling would, well, fill a journal.
What is interesting, though, is that Gentile’s data provider, Harris Interactive, itself doesn’t claim sampling error for its opt-in online results. Its standard disclosure is quite clear: “Because the sample is based on those who agreed to participate in the Harris Interactive panel, no estimates of theoretical sampling error can be calculated.”
If that’s what Harris Interactive says about its own data, how does Prof. Gentile produce a margin of error for data from that selfsame provider? I’ve called him, left word, and will post his reply as soon as I hear back.
How, for that matter, do all the other producers of opt-in panel data who claim a margin of sampling error justify the calculation? I welcome their replies as well.
This is far from an inconsequential issue. The public discourse is well-informed by quality data; it can be misinformed or even disinformed by other data. It is challenging – but essential – for us to differentiate.
One basic step, from my perspective, is to see if we have a reliable, representative probability sample, distinctively signified by its margin of sampling error. When convenience samples claim that imprimatur, we are being misled on the bona fides of the data at hand – as occurred, I fear, just yesterday. And probably will again.
4:30 p.m. update:
Prof. Gentile got back to me this afternoon. He said he was unaware the data in his study came from a convenience sample – “I guess I’d assumed they had gathered the population initially as part of a random probability sample” – and that, relying on his own background in market research, he’d gone ahead and calculated an error margin for it. “I missed that when I was writing this up. That is an error then on my part.”
He also referred me to his project director at Harris Interactive, Dana Markow. “This data was collected online and you’re correct, it’s not a probability sample,” she said. “We don’t claim that.” Asked if computation of sampling error requires a probability sample, she said, “Typically, yes.” And asked if Harris Interactive itself does not claim error margins for its convenience samples, she said, “That is certainly our typical approach.”
Markow said Gentile had shared his findings with Harris Interactive, but that she didn’t recall seeing a claim of sampling error in the materials he provided.