How outlier polls happen — and what to do with them

Any one poll is subject to error. That's why we have an average.

September 26, 2023, 3:33 PM
President Joe Biden listens during a meeting on the sidelines of the U.N. General Assembly in New York City, Sept. 20, 2023.
President Joe Biden listens during a meeting on the sidelines of the U.N. General Assembly in New York City, Sept. 20, 2023.
Kevin Lamarque/Reuters

Two new polls of the 2024 general election released this weekend showed different numbers. One poll, conducted jointly by Hart Research Associates (a Democratic pollster) and Public Opinion Strategies (a Republican firm) on behalf of NBC News, found President Biden and former President Donald Trump tied at 46 percent among registered voters. But a different survey, from ABC News and The Washington Post, found Trump up by 9 percentage points (51 percent to 42 percent) among adults. (For full disclosure: 538 is a political data journalism vertical of ABC News and has partnered with both ABC News and The Washington Post on other national polls.)

The latter poll is at odds not only with the NBC News survey, but also a simple average I calculated of all 2024 general-election polls. (Controlling for recency and sample size, that average on Saturday — before the two polls were released — had Biden leading Trump by roughly 2 points.) In its writeup of the poll, The Washington Post noted, “The difference between this poll and others, as well as the unusual makeup of Trump’s and Biden’s coalitions in this survey, suggest it is probably an outlier.”

Indeed, even good pollsters produce outlier results from time to time. But how does this happen? How can you tell when statistical error may be affecting the result of a poll? And how should we be interpreting and using these outliers, if at all? Let me share with you how I go about answering these questions.

The anatomy of an outlier poll

In order to understand how one poll can differ so much from others, you first have to understand how polls are conducted. For simplicity’s sake, we’ll limit this explanation to surveys conducted over the phone, as both the ABC News/Washington Post and NBC News polls were.

The process of polling begins with pollsters obtaining a list of phone numbers of potential interviewees. They can do that in a couple of different ways. Some pollsters use a method called random digit dialing (RDD) in which they use a computer to randomly generate a list of landline and cell phone numbers to call. Others use something called registration-based sampling (RBS), in which pollsters obtain phone numbers from voter registration records published by each state.

Already, this introduces the first opportunity for error in a poll, called “coverage error.” In both of the above methods, a proportion of the population is excluded from being potentially polled — they are not “covered” by the poll’s sampling frame. Someone who doesn’t have a phone number will be excluded from both an RDD and RBS poll. An RBS poll further limits the covered population to people who have registered to vote. That’s a little under half of U.S. adults, according to the Pew Research Center — though this typically does not impact the results of a poll too much. That said, you wouldn’t want to use an RBS poll to survey the population of Americans who are not registered to vote. Voters may also provide no or non-working phone numbers when they register, making those people unreachable for pollsters.

Then, the pollster actually calls and interviews a certain number of people on its list — somewhere around 1,000 people for these large national media polls. That introduces the second opportunity for error, called “sampling error.” The U.S. is a huge country, and pollsters’ samples of 1,000 people might not represent the population of 258 million adults. Pollsters use a statistic called margin of error to measure this potential error. In the latest ABC News/Washington Post poll, the margin of sampling error is about 3.5 points. That means Trump’s 51 percent of the vote could be as high as 54.5 percent or as low as 47.5 percent if they conducted this poll again many times with different samples. (Similarly, Biden could be as high as 45.5 percent or as low as 38.5 percent.)

But even if you take Biden’s best-case scenario from the ABC News/Washington Post poll’s margin of error — a Trump lead of about 2 points — it still doesn’t match the current polling average of Biden+2. So some other source of error is probably affecting these numbers too.

That brings us to the third type of error to be aware of: “nonresponse bias.” This happens when certain groups of people are less likely than others to answer polls. Maybe they don’t pick up the phone, or maybe they pick up the phone, hear it’s a pollster calling and hang up. However they get to that point, these non-responders can be different from the people pollsters do talk to — and that can create problems. Pew, for example, has found that older voters and people who are engaged in politics are likelier to answer polls, and the Census Bureau has found that white people and educated, high-income individuals are the likeliest to respond to them. And while pollsters have some statistical tools for fixing these issues with demographic groups, they have few solutions for bias that arises when certain political groups are more or less likely to answer polls, which we call “partisan nonresponse bias.”

It’s possible that this partisan nonresponse bias affected the ABC News/Washington Post poll published on Sunday. Diving into the “crosstabs” of the poll (this is the document from a pollster that breaks down the poll’s results by different slices of the population), we can see that Biden is conspicuously weak among key Democratic groups. For example, among voters ages 18-29, 41 percent said they would vote for Biden if the election were held today, and 48 percent said they would vote for Trump — a margin of -7 points for Biden. But according to an ABC News exit poll conducted by Edison Research in 2020, Biden won that group by 24 points in 2020. That’s a 31-point drop on the margin — more than twice as much as the overall population shifted. Biden also saw steep declines among Latinos, urban dwellers and Black respondents.

Though we can’t be sure the poll’s findings are wrong (the 2024 election has not happened yet!), they are out of step with expectations. Not only are those huge shifts, but they’re also at odds with what other high-quality national polls are saying. For example, an August survey from The New York Times and Siena College (which uses an RBS methodology) that put Biden and Trump in a dead heat nationally found Biden winning 56 percent of young voters. That’s a much more plausible subgroup estimate, and it suggests that there may be something statistically unrepresentative about the types of young voters who are showing up in RDD surveys.

There’s one more type of potential error to consider (for now): “measurement error.” Sometimes, a question pollsters ask does not do a good job of capturing the underlying thing they’re trying to measure. That can manifest itself in unusual ways. For example, occasionally when partisans are unhappy with the news or something their political party has done, they will exaggerate to a pollster to make a point. One study found, for example, that survey respondents in 2018 claiming Trump had a larger inauguration crowd than former President Barack Obama knew they were responding incorrectly, but did so anyway to “cheerlead” for their partisan team. Scholars call this phenomenon “expressive responding.” Similarly, if young people and people of color are dissatisfied with the progress Biden has made or upset that he is running for reelection, they could be withholding their support for him in polls as a way to register their displeasure with the situation. Ultimately, however, they may vote for him anyway.

Measurement error also can pop up for certain questions and not others. The NBC News and ABC News/Washington Post polls have similar net approval ratings for Biden despite finding different horse-race numbers. That may suggest that the survey designs of both polls are sound overall, but something has narrowly gone wrong to push the ABC News/Washington Post’s horse-race number off course. Here, expressive responding among Democrats who disapprove of Biden may be the culprit.

In his analysis of the ABC News/Washington Post survey for ABC News, pollster Gary Langer noted that the order of questions in the poll may also have been a factor. “As is customary for ABC/Post polls at this still-early stage of an election cycle,” Langer wrote, “this survey asked first about Biden and Trump’s performance, economic sentiment and a handful of other issues (Ukraine aid, abortion and a government shutdown) before candidate preferences. That’s because these questions are more germane than candidate support in an election so far off. Since many results are negative toward Biden, it follows that he’s lagging in 2024 support.”

What to do with outlier polls

Given that so many different factors can affect the result of a poll, it is crucial that we don’t anchor our understanding of politics to any one survey. But at the same time, we shouldn’t just throw out data we don’t believe. That’s a dangerous precedent to set: Throwing out polls you don’t believe in is a recipe for “herding” — where pollsters release their polls only if they’re close to other surveys (or, in the worst cases, actually adjust their polls so they match others). This can decrease the accuracy of polling averages, which rely on seeing a diverse portrait of public opinion to find the right signal in the data. More philosophically, the entire point of scientific polling is to root your understanding of politics in some sort of objective scientific process. When that process yields uncertain results, you should question them and rework any errors, but it takes time and repeat observations to know when something is broken.

That’s why 538’s philosophy has long been to average outliers (warts and all) with all the other surveys we have of the same race. If you have enough data (as we do in high-profile elections like the 2024 presidential race), then random outliers will barely budge the bottom-line average. Furthermore, there are some statistical adjustments we can make if we want to be smarter about aggregating polling.

One is to put less weight on true outlier polls. Here at 538, we are statistical Bayesians — meaning we have adopted the mathematical worldview of Thomas Bayes, a reverend and statistician from the 18th century whose landmark work helps us understand how much to update our current beliefs (or “priors”) given new data. In our case, Bayes’s theorem suggests that a poll should shift our priors more the more plausible that poll’s result is. For any poll in the normal range around our current Biden+2 polling average — currently, the 90 percent confidence interval for our polling average runs from around Biden+8 to Trump+7 — you could give it full weight in your polling average. Beyond that point, you should decrease the weight you give a poll because it’s more likely that that poll is a statistical outlier.*

There’s another consideration, however. If a pollster releases multiple outliers in a row, it is less likely to be due to randomness in sampling, nonresponse or measurement error and more likely that something about its methodology is causing results to differ from the average. ABC News/Washington Post’s last survey, conducted from April 28 to May 3, had Trump up by 7 points, and their Jan. 27-Feb. 1 poll had Trump up by 3. Now, three polls is not enough to be confident that methodology, rather than randomness, is causing apparent statistical bias. But those surveys also had disproportionately high support for Trump among young people, with the former president winning voters ages 18-29 by a 19-point margin in the spring poll.

Of course, just because a poll differs from the average does not mean its methodology is wrong. For election surveys, we won’t know what the “best” methodology is until all the votes are counted and we can compare pollsters on their accuracy. But methodological differences are something worth accounting for in a model. We can adjust for the fact that we think a pollster’s methodology is too good to be true for Republicans (or Democrats) by docking them a few points. At 538, we call this adjusting for a pollster’s “house effects,” and it’s one of the hallmarks of our polling averages. You can read more about that on our methodology page, but one interesting wrinkle here is that when a pollster has a consistent house effect, their data is actually more trustworthy — since we can trust that it will always be a certain amount too good for one side or the other. In other words, the more “outliers” a pollster releases, the more certain we are about their house effect, and the more weight our polling average will give their (adjusted) polls.

None of this is to say that 538 will be decreasing the weight we give to ABC News/Washington Post polls going forward. Though any other outliers will be treated with some skepticism by our models (as they are when we see them from any outlet), a few surprising polls do not override the long history of accuracy from the pollster. In previous versions of 538’s pollster ratings,** which grade pollsters based on their empirical records and methodological transparency, the ABC News/Washington Post poll has been one of the top 10 most accurate in America. If, in the future, the pollster comes to have a predictable house effect, our model will automatically adjust its polls accordingly while still giving them their due weight.

Trust the average

In conclusion, it’s likely that the newest ABC News/Washington Post poll is an outlier. Since we are writing from the vantage point of 2023, staring head-on at a deluge of polling data for the primary and general election, this is a good reminder that even a sound methodology can yield surprising results. And when a certain methodology produces multiple surprising results in a row, that’s when pollsters review their methods for any quirks or kinks. That is a lot different, however, from just throwing out data you don’t believe in. That would make for bad predictions and dishonest polling. Here at 538, we encourage the use of well-designed polling averages to strike the delicate balance between being skeptical of improbable polls and not rejecting them completely. We believe a more nuanced reading of polling data will leave us all more informed and better off.

*In 538’s actual polling averages, our model handles this sort of weighting for us, so we don’t have to do back-of-the-napkin multiplications by probabilities like we’re discussing here.

**538’s pollster ratings are currently being updated with a new methodology and are not currently used by our polling averages.