Sampling Error: What it Means
Oct. 8, 2008 -- Surveys based on a random sample of respondents are subject to sampling error – a calculation of how closely the results reflect the attitudes or characteristics of the full population that's been sampled. Since sampling error can be quantified, it's frequently reported along with survey results to underscore that those results are an estimate only.
Sampling error, however, is oversimplified when presented as a single number in reports that may include subgroups, poll-to-poll changes, lopsided margins and results measured on the difference. Sampling error in such cases cannot be described accurately in a brief television or radio story or on-screen graphic.
Sampling error assumes a probability sample – a random, representative sample of a full population in which all respondents have a known (and not zero) probability of selection. Given that prerequisite, sampling error is based largely on sample size, but also on the division of opinions or characteristics measured and on the level of confidence the surveyor seeks. A larger sample has a lower error margin. A result of 90-10 percent has a smaller error margin than a 50-50 result; when more people agree, there's less chance of error in the estimate. And a result computed at the 90 percent confidence level has a smaller error margin than a result computed at 95 percent confidence.
Assuming a 50-50 division in opinion calculated at a 95 percent confidence level, a sample of 1,000 adults – common in ABC News polls – has a margin of sampling error of plus or minus 3 percentage points. The error margin is higher for subgroups, since their sample size is smaller. Given customary subgroup sizes, for 800 whites the error margin would be plus or minus 3.5 points; for 560 women, +/- 4 points; for 280 Republicans, +/- 6 points. Click here for a list of examples using averages from recent ABC News polls.
At a 90-10 division of opinion, rather than 50-50, still at 95 percent confidence, sampling error for 1,000 interviews is +/- 2 points, not 3. For a sample of 100 cases – roughly the minimum sample size ABC News will report – the error margin is +/- 10 points at a 50-50 percent division, +/- 8.5 points at 75-25 percent and +/- 6 points at 90-10 percent. (ABC News polls at times will oversample small populations to increase their sample size to a level we consider reliably reportable.)
As noted, the confidence level is the third chief variable in sampling error. (There are other factors in some surveys, such as design effects – see the addition to the end of this piece - and finite-population adjustments, which we'll leave aside here.) The 3-point error margin at 95 percent confidence for a sample of 1,000 declines to a +/- 2.5 points at 90 percent confidence and +/- 2 points at 80 percent confidence.
(Some organizations round sampling error to whole numbers; others report them to the decimal. ABC's practice is to round them to the half. That acknowledges the differences caused by sample size – 800 and 1,500 both round to +/-3; better to show the former as 3.5 and the latter as 2.5 – without suggesting the level of precision in the data implied by entirely unrounded decimals, e.g. +/-3.3.)
The calculations above are based on a single sample, using a standard formula – multiply the division in opinion (e.g., .50 times .50 for a 50/50 split), divide the result by the sample size, take the square root and multiply by the so-called "critical value"– for 95 percent confidence, 1.96.
Note that the division is premised on simple dichotomous responses (support/oppose, yes/no, Candidate A/Candidate B). The formula is different for measures that have three or more response choices – relevant, for instance, in calculating the margin of error for candidate support in a multi-candidate election. While the differences usually are minor for responses in the 30 percent to 70 percent range, for precision in such cases we use a formula reported by Prof. Charles Franklin of the University of Wisconsin in his 2007 paper, "The Margin of Error for Differences in Polls."
The calculation of differences between two independent samples – such as change from one poll to the next – also is computed differently. For example, it takes a change of 4.5 points from one poll of 1,000 to another the same size to be statistically significant, assuming 50/50 divisions in both samples and a 95 percent confidence level. (Again that is lower at different divisions in opinion and/or lower confidence levels; and higher for smaller sample sizes, e.g. subgroups.)
Other comparisons require other calculations. To compare results measured on the difference from one poll to another – e.g., from a 14-point lead for Candidate A in one survey to a 4-point lead for Candidate B in the next – our approach is to calculate the error margin for change in Candidate A's support from one poll to the next, then the error margin for change in Candidate B's support, and ensure that the change is significant.
Calculating the significance of poll-to-poll change in an index, such as the ongoing ABC News Consumer Comfort Index, also requires more complicated calculations, for which ABC relies on consultations with sampling statisticians.
In all cases, the ABC News Polling Unit describes differences or changes in polling data as statistically significant only on the basis of calculations that this is the case. Results that are significant at a high level of confidence, but below 95 percent, may be characterized with modifying language, such as a "slight" change. And in some cases we'll report the confidence level at which a result is statistically significant.
It should be noted that results are not equally likely to fall anywhere within a margin of sampling error, but instead are least likely to extend to its extremes. For example, if candidate support is 51-45 percent in a 772-voter sample with a 3.5-point error margin, that's "within sampling error;" it could be a 46.5-49.5 percent race at the extremes. However, the probability that the result in fact constitutes a lead for the 50-percent candidate can be calculated; in this example it's 91 percent.
That or any confidence level indicates the number of times a theoretical infinite number of samples, of a given size and a given result, would come within sampling error of the actual population value – 9 times out of 10 at 90 percent confidence, 19 out of 20 at 95 percent confidence, 99 out of 100 at 99 percent confidence.
All these calculations account only for sampling error, the only kind of imprecision that's readily quantifiable in probability-based samples. Survey research also is subject to non-quantifiable non-sampling error, including factors such as methodological rigor; non-random non-coverage of elements of the population under study; non-random non-response influencing who participates; the wording, order and response categories in questions; and the professionalism of interviewers and data producers. Of note, no margin of sampling error is calculable in non-random, non-probability samples, such as opt-in internet panels.
Update on design effect - 12/09
A further complication in sampling error, alluded to above, stems from a survey's design effect, a calculation that adjusts for effects such as clustering in area probability samples (exit polls, for example, or our face-to-face surveys in Iraq and Afghanistan); and weighting, relevant to random-digit-dialed (RDD) telephone surveys as well as to other forms of probability sampling.
In exit polls conducted for the National Election Pool, a media consortium including ABC News, the design effect of clustering and weighting alike is given as 2.25. As a result, a sample of 1,000 people in one of these exit polls has an error margin of +/-4.5 points (with a 50/50 split at the 95 percent confidence level), rather than the 3 points that would have been calculated without taking the design effect into account. (This is figured by multiplying the error margin based on sample size alone, in this case 3 points, by the square root of the design effect, in this case 1.5.
In RDD telephone samples, the design effect due to weighting in the past generally has been so slight as to be ignorable. That's changed recently as telephone sampling procedures have been altered to include cell-phone respondents; these procedures increase the theoretical margin of sampling error because additional weighting is needed to incorporate the cell and landline samples. (The situation also occurs when oversamples are used to increase the reliability of the estimate of a particular group. Again, while oversampling is done to improve estimates, the weighting required to adjust the sample back to true population norms increases the design effect in the full sample.)
At ABC we've tracked the design effect of each poll we've conducted since we started adding cell-only interviews in fall 2008; in the last six (with consistent cell-only sample sizes) it's averaged 1.42. Inclusion of this design effect is why we now report most ABC/Post polls of about 1,000 people as having a margin of sampling error of plus or minus 3.5 points, rather than the customary 3 points.
It's ironic that taking steps to improve the accuracy of a survey by enhancing coverage of its target population has the perverse effect of increasing its theoretical margin of sampling error; this is a reason that sampling error in and of itself is not a full measure of a survey's accuracy. It's also a reason to be cautious making comparisons across surveys. Some, less accurately, report a lower margin of sampling error because they don't take design effects into account. Others may have a lower theoretical error margin, but significant noncoverage -- an example of the nonsampling error described above.
In some ways this situation is similar to that involving response rates, which can be improved in ways that degrade sample coverage. (See details here.) Better response rates, for that reason, in and of themselves are not necessarily indicators of better data. Likewise, a lower theoretical sampling error does not necessarily indicate a better estimate, if for example it were obtained via a sample that failed to optimize coverage of the population under study.
With thanks for review and comment by Charles Franklin, Paul Lavrakas and Dan Merkle.