A well-known Yale economist has written a book using the mathematical technique of regression to predict the outcome of presidential elections. Ray C. Fair's Predicting Presidential Elections and Other Things grew out of his 1978 paper that provides quite accurate descriptions of these quadrennial elections dating back to 1916. Before getting to his very surprising prediction for this November (with which I disagree), let me sketch the idea behind the technique.
To see how an adult son's height, for example, relates to his father's height, first find a random collection of father-son pairs. For each pair, measure the height of the father and the son and plot them, respectively, on the horizontal and vertical axes of a graph so that each father-son pair gives rise to a point on the graph. Examine the cloud of points thus generated and determine the closest-fitting straight line through them. Most of the points will probably be clustered near this line, known in statistics as the regression line, and its equation allows us to predict, with a certain margin of error, a son's height from that of his father.
Of course, the details and assumptions required to do this are a bit complicated, especially when the quantity in question depends on more than one factor.
The Model and the Prediction
In Professor Fair's model the quantity being predicted (the share of the presidential popular vote going to the incumbent party) depends on six factors. The first is which candidate is an incumbent, incumbency being a distinct advantage historically. The second is party, Republicans having a slight historical edge, and the third is "party fatigue," not being in power for more than two terms giving some benefit. The remaining three factors concern the economy: the per capita growth rate for the GDP (higher is better for the incumbent), the number of quarters during the previous 3¾ years in which the growth rate exceeded 3.2 percent (higher is better), and the inflation rate (lower is better).
If we plot these six factors along with the percentage of the two-party vote going to the incumbent political party on a graph (an impossible-to-visualize 7-dimensional graph) for each of the elections since 1916, we get a cloud of points. Examining it and using standard statistical tricks, we can determine the closest-fitting plane through them. The equation describing this plane gives us the prediction for the upcoming election.
The bottom line: the model predicts that the outcome won't even be close. Bush will win somewhere around 58 percent of the vote.
This is quite contrary to the polls and to the election futures markets, both of which indicate a very tight race. Still, before dismissing Fair's prediction out of hand, note that his model has been impressive (in retrospect) over the 22 presidential elections since World War I. Although the model's predictions of the victor were wrong in a couple of very close elections, its predictions of the vote percentages in these years were nevertheless quite accurate.
The one notable failure of the model occurred in 1992 when it predicted that Bush senior would beat Clinton, who won handily. Fair explained this latter failure by pointing to the disparity between the condition of the improving economy then and voters' lagging perception of its condition. In all other elections predictions of the victor and of the vote percentages were (within a reasonably tight margin of error) correct.
Quibbles, Caveats, and Data Dredging
Regression models are common in social science (Fair's book presents some very good examples), but they sometimes yield implausible results. (See my column on John Lott's "More Guns, Less Crime.") Fair acknowledges that there are particular caveats we should be aware of in evaluating his presidential voting model. One glaring weakness is that there are only 22 data points (6 after the basic model was devised in 1978) with which to fashion the predictive equation.
More generally, we should recognize that a certain degree of data dredging — after-the-fact torturing of data to reveal accidental relationships and meaningless correlations — can always make a model appear more impressive than it is. Rejiggering it so that it "predicts" the past is not particularly difficult.
Furthermore, "voting one's pocketbook" and not "rocking the incumbent's boat" may usually be, but need not always be, the dominant determinants of the vote. I suspect that other variables — the war in Iraq, cultural and environmental issues, and concerns about civil liberties — will play a more important role in this year's election than in past years, and they are not part of Fair's model. Even the economic factors in his model fail to reflect anxieties over job losses, huge deficits, and increasingly disproportionate inequalities of income.
The prediction of Fair's model that Bush will win by a wide margin is likely to be disturbing to Kerry supporters and heartening to Bush supporters. Because of the anomalous nature of this election, the testimony of the polls and the election markets, and the frequent unreliability of regression models, however, I do not believe it.
We'll just have to wait until November (or, horrors, December) to see whether Professor Fair's model will be outfitted with new variables and, a bit more importantly, the country outfitted with a new president.
Professor of mathematics at Temple University, John Allen Paulos is the author of best-selling books, including Innumeracy and A Mathematician Plays the Stock Market. His Who’s Counting? column on ABCNEWS.com appears the first weekend of every month.