Life can get complicated in survey research, and here’s a prime example: On the way to better sampling in random-digit-dialed telephone surveys, the theoretical margin of error has, of all things, increased.
Not in a big way; indeed hardly in a meaningful one. But it’s an interesting tale, at least to those of us concerned with such matters, as well as a reminder about the tradeoffs between theory and empiricism, and the need to do the right thing even when it smarts a little.
The right thing, these days, is to expand traditional landline samples to include people who only use cell phones. We’ve been doing so since October 2008, as have most of our colleagues producing good-quality telephone surveys based on representative, random samples of respondents.
Empirically there’s not a great reason to go to this effort. Customary weighting techniques, used to true up samples by adjusting them to Census norms for variables such as age, race, sex and education, do a good job correcting landline samples for the absence of cell-only respondents. (Adding a few other weighting variables, such as home ownership, can help further.) As I’ve reported before, when we look at our landline-only results, adding cell-only participants doesn’t make a meaningful difference in the variables we measure. That’s because their attitudes just aren’t that different from those of their landline-using demographic counterparts.
That said, roughly 20 percent of Americans now are in cell-only households, and theoretically, if not empirically, that’s just too much noncoverage to tolerate. So we do include them – with a paradoxical and indeed somewhat perverse effect: Higher theoretical sampling error.
The reason is something called “design effect,” meaning the effect sampling and weighting techniques may have on random-sample survey estimates. There is, for example, a design effect due to clustering in area-probability samples, the kind drawn for face-to-face interviews. The mostly face-to-face exit polls done for ABC News and other media have a design effect of 2.25, meaning the margin of sampling error for an exit poll of 1,000 people is 4.5 points. Without the design effect it would be 3 points. (You figure it by multiplying the error margin based on sample size alone by the square root of the design effect.)
There’s also design effect due to weighting. In most traditional landline samples it’s been minimal, but cell-only sampling’s changed things. While there’s a variety of approaches (ours is described here), each requires additional layers of weighting to fit the cell and landline samples together. That additional weighting means more design effect, an average of 1.42, for example, in our most recent ABC News/Washington Post polls. Plug that into the equation and our polls of 1,000 people now have a theoretical margin of error of 3.5 points, rather than the customary 3 points before we started including cell-only respondents.
It’s ironic that theoretical sampling error is higher as a result of measures that clearly improve a sample by expanding its coverage of the target population (all adults in this country, regardless of what kind of phone they have.) That’s because noncoverage, as big a problem as it can be, falls within the rubric of nonsampling error, rather than sampling error, so addressing it doesn’t help the error-margin calculation. (For more on sampling error, see here.)
This kind of tradeoff’s nothing new; I wrote a piece years ago describing how higher response rates in probability surveys can be produced by techniques that create other problems, such as, again, increased noncoverage. That means that higher response rates, while desirable on an all-else-is-equal basis, are not, in and of themselves, a sure sign of better data.
So, it turns out, is the case with lower sampling error; indeed, as with response rates, judging a poll’s quality solely by its error margin can be misleading. Polls that don’t include cell-phone respondents may have a slightly lower theoretical margin of error, but only because of their noncoverage. Others may simply not bother to include design effect in their calculation of error margin; it can look slightly lower because a step’s been skipped in figuring it.
Given the world of compromised data out there, produced by inferior or theoretically challenged means, spending time on the niceties of design effects may be small beans. Nonetheless, good research depends on a range of best-practices measures, in sampling, coverage, questionnaire design and data analysis alike. And sometimes it means doing the right thing – even if it adds a half-point to your margin of sampling error.