How 538's 2024 Senate election forecast works
Here's everything that goes into this year's model.
On Wednesday, 538 released its 2024 election forecast for the U.S. Senate. The model powering this forecast combines our polling averages for each Senate race with other quantitative and qualitative data to figure out which candidates are likely to win in each state. Then we calculate how much uncertainty we have about the outcome and how that uncertainty is shared across different regions and the country as a whole, and finally produce a forecast of overall chamber win probabilities for each party. This article explains all the different steps we take to come up with those final numbers.
1. The polls
Our Senate forecast model starts with our published average of polls in each race. Broadly speaking, we weight the polls according to the pollster rating of the responsible firm, adjust results for the way a poll was conducted (such as which firm conducted it, the mode of the interview, or whether a partisan actor sponsored or conducted the poll), and then calculate the likeliest trendline running through the data. You can read the full explanation of how we calculate Senate polling averages here.
For our forecast, we also adjust the polling averages in each race for changes in the national political environment. The idea here is that, in states that aren't polled often, our polling average may not have the most up-to-date information about a race. Adjusting each candidate's vote shares up or down when their party's support in national polls goes up or down can help make these averages more accurate when there is no new state-level data. Since we have found the House vote to be loosely predictive of Senate race outcomes historically, we use our generic congressional ballot polling average (the version we use in our House forecast, not the one we publish on our polls page) as a stand-in for a party's national support.
One note: This trendline adjustment is generally less helpful at figuring out Senate vote shares than a trendline adjustment for presidential election polls. That's because, in presidential elections, the same candidates are running in every state (with the exception of some minor candidates), so it follows that support for them at the state level should ebb and flow with their support nationally. For Senate candidates, who are different in each state, it's easier to imagine support moving independent of each other.
2. The fundamentals
Importantly, we also infer what polls would say in states without Senate polling (although there are relatively few of these). We do this by running a regression that predicts the polling averages in each Senate race using a variety of non-polling factors we call the "fundamentals." For the most part, the fundamentals for the Senate are modeled the same way we model them for the House; the model just learns different weights for each variable based on historical results for the Senate instead of the House.
The first, and most important, fundamental is simply the partisan lean of the state. 538 calculates a state's partisan lean by taking a weighted blend of how much redder or bluer the state voted than the country as a whole in the most recent and second-most recent presidential election.
The exact weight that the most recent presidential result gets relative to the second-most recent is decided by our model based on what has best predicted states' future partisanship in the past. For the 2024 election, partisan lean is about three parts 2020 results and one part 2016 results. This means a state that voted by 9.5 percentage points for President Joe Biden in 2020 (thus being 5.0 points bluer than the overall nation, which voted for Biden by 4.5 points) and by 3.1 points for former Secretary of State Hillary Clinton in 2016 (thus being 1.0 point bluer than the overall nation, which voted for Clinton by 2.1 points) would have a partisan lean of D+4.0 (the average of 5.0 and 1.0 if 5.0 gets three times as much weight as 1.0). This number represents the expected vote margin in this state in a completely neutral year — if the candidates perform exactly the same as the presidential candidates over the last two cycles.
Of course, Senate candidates are not presidential candidates, so there are a few other state-specific factors to take into account:
All these variables enter a regression model trained to predict the two-party vote in each contested Senate election since 1998. That model is fit using a statistical technique called Markov chain Monte Carlo, which we run using the programming language Stan. Similar to the way that calculators for high school algebra work, Markov chain Monte Carlo takes in a mathematical equation from the user and figures out the values of each variable that maximize the predictive accuracy of the model. More details on this technique are available in the methodology for our presidential election model.
This method works great for predicting support for Republicans and Democrats in races with only two candidates — one from each major party. But what about candidates of other parties? We calculate their support differently depending on whether they are a "major" or "minor" candidate. (We designate a third-party candidate as "major" if they consistently poll at or above 10 percent of the vote.)
Here’s how we calculate support for major third-party candidates. For third-party candidates who have said they will caucus with a specific party, we treat them as a member of that party. For example, our model treats independent Sens. Angus King and Bernie Sanders as Democrats because they currently caucus with Democrats and are widely expected to continue to do so if they win reelection.
For major third-party candidates who have not revealed who they will caucus with, we still treat them as a major-party candidate for the purposes of calculating their vote share. In races where a major third-party candidate is running against candidates of both major parties, we treat them as a member of the party from which they are siphoning off the most support. In races where a major third-party candidate is running against a candidate of only one major party, we treat the third-party candidate as the member of the major party that is not fielding a candidate. For example, for the purposes of predicting the regularly scheduled election for Nebraska Senate, our model treats independent Dan Osborn as a Democrat because, in running against a Republican candidate but no Democratic candidate, he is the de facto Democratic candidate.
However, for the purposes of predicting chamber control, we don’t necessarily assume that these candidates will caucus with the party that they are standing in for in the election. Instead, when a candidate (like Osborn) is noncommittal about which party they will caucus with, we basically randomize it: We assume they will caucus with Democrats one-third of the time, with Republicans one-third of the time, and with neither party one-third of the time.
We predict support for minor third-party candidates in a second stage of our Senate model. This stage trains a simpler regression model to predict historical third-party vote shares using the average support for third-party candidates in the polls. Then, to predict the Democratic and Republican vote shares in that seat, we multiply their two-way vote shares from the first stage of the model by all the votes left over. An exception to this is races featuring two candidates of the same party (such as in California's 2018 Senate race). In these races, we do not make a prediction for the share of the vote going to each candidate and, for purposes of calculating chamber control probabilities, always count the seat for that party.
In jungle primaries, such as Georgia's Senate special election in 2020, in which all candidates run on the same ballot regardless of party, we run an extra stage of the model that uses polls and fundraising to predict each candidate's vote share. In each model simulation in which no candidate wins a majority of the vote (which is required to avoid a runoff), we simulate the runoff using either the original predicted two-party vote for the seat (when there is both a Democratic and Republican candidate) or the predicted share of the partisan vote going to each candidate divided by the total between them (when all candidates are from the same party). However, this extra stage will not come into play in 2024 because there are no jungle-primary Senate races on Nov. 5.
3. Qualitative race ratings
Empirically, combining the polls and fundamentals gives us a very historically accurate model for predicting Senate elections. On average, when looking at the 2010 through 2022 elections, a version of our forecast that uses just polls and fundamentals predicts the wrong winner in contested races just 7 percent of the time — equivalent to two races per year. This is much lower than the theoretical number of misses we should have seen given the error in our models: In backtesting, we expected to predict the wrong winner in 31 out of 449 Senate races held since 2010, but in fact only whiffed on 18. No seat we gave a greater than 98-in-100 chance to vote a certain way has ever gone to the underdog.
However, there are limits to what we can measure about Senate elections using data and hard math. Sometimes states jump to the left or right due to local, more qualitative factors that are missed in polls. For example, candidates can be inexperienced or a bad fit for a state in ways not captured by our quantitative metrics on experience and incumbency.
In these cases, it makes sense to use qualitative race ratings to fill in the gaps. These ratings come from experts over at Sabato's Crystal Ball, the Cook Political Report and Inside Elections, where analysts spend time interviewing candidates, talking to constituents and crunching their own demographic and electoral data to make predictions about elections. You will usually see these ratings spelled out in a qualitative manner, such as "Lean Republican" or "Likely Democratic." We take those categorical ratings, convert them into numbers and input them into our models as their own predictors in our big regressions.
We convert experts' categorical ratings into numbers by taking their historical ratings and calculating the average margins and standard deviation of those margins for all the candidates within each category. For example, in expert ratings from 1998 through 2022, we find that when a party's candidate was given a rating of "Tilt" or "Lean" in their favor, they usually beat their opponent by 10 points (with a standard deviation of 10 points). In "Likely" seats, the party usually won by 15 points (with a standard deviation of 12 points), and in "Safe" or "Solid" seats, 31 points (with a standard deviation of 15 points). The predicted margin for a candidate in a "Toss-up" seat is +0, with a standard deviation of 10 points.
We use this converted categorical rating as a predictor in our overall regression model, letting the forecast decide how much weight to put on experts' conventional wisdom based on how much additional value they have provided historically above and beyond the polls plus fundamentals. We find that, on average from 2000 through 2022, a race rating should have received about 15 percent of the overall weight in a Senate prediction, with the remaining weight going to a combination of the state-level polls (48 percent) and fundamentals plus the generic ballot (37 percent).
The final weights of our model depend on how many polls are available for each seat. In states with no polls, our model puts about 35 percent of the final weight of the cumulative prediction on the race rating, about 62 percent on the fundamentals plus generic ballot polls, and the remaining 3 percent on an imputed polling average based on polls in other states. As the number of polls for a seat increases, weight on the polls increases and weight on the ratings and fundamentals decreases. If there are a dozen polls for a seat, we would put about 6 percent of the final weight on the race rating, 36 percent on fundamentals and 58 percent on the polling average for that seat. In races with few polls, the polling average is reverted back toward the imputed average there based on polls in other states.
4. Simulating uncertainty
Despite all this information and fancy modeling, our forecast will not be perfect in every seat — and especially in years when polls are off, we can have some big misses. The value add of our forecast over polls and ratings is in correctly quantifying that uncertainty so we can give the public a good idea of the true range of outcomes for a given contest. We want, in other words, to help our readers understand uncertainty and avoid being surprised.
Our model looks at prediction error as coming from three overall sources. First, there is national error, which is shared across all seats. This error accounts for things like the generic ballot polls being off or the predictive value of presidential versus Senate results changing dramatically from cycle to cycle. Second, there is regional error, which can affect all the Senate races in a given region. It's possible that there will be a shift toward Democrats just in the Midwest, for example, or toward Republicans in the Southwest. We group states according to the regional definitions established in 538's presidential election forecast.
Third, and finally, there is idiosyncratic error belonging to each state in isolation; this accounts for factors such as bad polls, candidate quality issues, local effects of state ballot laws and the like. As a general rule, contest-specific error in Senate races is higher than for presidential elections (where there are more polls and candidates are better known) and lower than for House races (where the opposite is true).
Added up, we expect there to be about 7 points of error on the margin between candidates in each seat: There are about 4.8 points of error at the national level, 2.8 at the regional level, and another 4.8 on the state level. (These numbers don't add up to 7 points because uncorrelated errors are not additive.)
We also allow for state-level error to be higher in states further away from an even partisan lean, higher in seats with no polls, higher in seats with "Solid" or "Safe" race ratings, and higher in seats where the polls disagree more with the fundamentals. We add this error because seats with lopsided margins tend to be harder to predict the exact margin of, even though their winner is not in question; similarly, when predictors disagree, that's a sign that there is no clear signal of a lead in a race. Adding this contest-level error on top of the base state-level error also helps the model distinguish between competitive and uncompetitive races, allowing for a better fit in the model overall.
Finally, we add a small amount of error to account for possible low-probability, "black-swan" issues in polling, which helps us account for effects of new challenges in polling such as partisan nonresponse bias or instances where weighting techniques that have worked in the past break down and thus were unforeseeable by our historical training set. This additional error amounts to about 1 point on margin and affects all races simultaneously.
That's it! If you have any questions, or see anything missing from our methodology or forecast page, drop us a line and we'll get right on it!