Dec. 5, 2010 -- Statistical correlation and transitivity are perhaps off-putting words to some.
They're not hard to understand, however. Two quantities -- say, people's heights and weights -- are positively correlated if whenever one of them goes up or down, the other tends to do so as well. Two quantities -- say, people's longevity and smoking rates -- are negatively correlated if whenever one of them goes up or down, the other tends to do the opposite.
And transitivity? A relation is transitive if whenever it holds between X and Y and between Y and Z, it also holds between X and Z. Most people assume that correlation is transitive. That is, they think that if a quantity X correlates positively with another quantity Y, and Y correlates positively with a third quantity Z, then X correlates positively with Z. But this can be a mistake. In many situations X may correlate negatively with Z.
Example from Baseball
An example cited in the American Statistician from 2001 provides a good counterexample from baseball. Examining the batting records of the New York Yankees with more than 300 at bats-in the previous year, the authors (Langford, Schwertman, and Owens) found that the number of triples hit by a player correlated positively with the number of base hits he had, which in turn correlated positively with the number of home runs he hit. Yet the number of triples a player hit correlated negatively with the number of home runs he hit.
Stated differently, players who got a lot of triples generally got a lot of hits of all kinds, and players who got a lot of hits also tended to hit a lot of home runs.
The reason triples and home runs were nevertheless negatively correlated is that players who hit a lot of triples were usually lithe and fast, traits that do not lend themselves to home run hitting, and players who homered a lot were generally big and slow, traits that do not lend themselves to hitting a lot of triples.
In general, even if a quantity X correlates positively with another quantity Y, and Y correlates positively with a third quantity Z, we can't conclude that X correlates positively with Z. Transitivity may not hold.
Still, we often assume uncritically the transitivity of correlation, particularly in medicine. If, for example, generally good health is positively correlated with personal income, which in turn is correlated with a certain health practice, say the taking of certain expensive vitamin and mineral supplements, we might conclude there is a positive correlation between good health and the ingestion of these supplements. Again, not necessarily so.
Furthermore, the more links there are between two quantities, the more likely transitivity will fail. Once again, the problem is that many people unconsciously reason that if U and V are positively correlated, and so are V and W, W and X, and X and Y, then U and X must be as well. It might be helpful to think of correlations as akin to distances. If U and V are less than a mile apart, and V and W are also within a mile of each other, it doesn't follow that U and W are less than a mile apart.
Non-transitivity is common in probability and election theory. One well-known example from Bradley Efron involves four dice, A, B, C and D. Being dice, they all have six faces, but these dice are strangely numbered as follows: A has a 4 on four faces and a 0 on two faces; B has 3's on all six faces; C has four faces with a 2 and two faces with a 6; and D has a 5 on three faces and a 1 on three faces.
If die A is rolled against die B, die A will win -- by showing a higher number -- two thirds of the time. Similarly, if die B is rolled against die C, B will win two thirds of the time; and if die C is rolled against die D, it will win two thirds of the time.
Nevertheless -- and here's the punch line -- if die D is rolled against die A, it will win two thirds of the time. A beats B, B beats C, C beats D, yet D beats A, all two thirds of the time.
(You might even profit from this by challenging someone to choose whatever die he or she wanted, and you could then choose a die that would beat it two thirds of the time. If they choose die B, you choose A; if they choose A, you choose D, and so on.)
Politics, Elections, and Non-Transitivity
Versions of non-transitivity crop up in sociological and political discussions where trait X is correlated with Y and Y with Z and so on.
Election preferences provide another example of non-transitivity. A nice illustration of electoral non-transitivity arises if we tweak the recent senatorial election in Alaska, where Lisa Murkowski was just recently declared the winner. Let's imagine that, contrary to fact, the electorate there ranked the three major candidates (all M's, by the way), Lisa Murkowski, Joe Miller, and Scott McAdams.
Let's further imagine that faction A, roughly one third of the electorate, preferred Murkowski to Miller to McAdams; faction B, also about one third of the electorate, preferred Miller to McAdams to Murkowski; and faction C, the remaining one third of the electorate, favored McAdams to Murkowski to Miller.
If this had been the case, a clear majority of the electorate - factions A and C - would have preferred Murkowski to Miller, and a clear majority - factions A and B -- would have preferred Miller to McAdams. Yet a clear majority -- factions B and C -- would have preferred McAdams to Murkowski.
Presidential primary season, say involving Republicans Palin, Romney, Huckabee, and Pawlenty in 2012, might easily produce such non-transitive preference rankings.
Finally, note that understanding why transitivity often fails is correlated with vast personal wealth. Well, maybe not ... but, still, it's important.
John Allen Paulos, a professor of mathematics at Temple University in Philadelphia, is the author of the best-sellers "Innumeracy" and "A Mathematician Reads the Newspaper," as well as, most recently, "Irreligion." He's on Twitter and his "Who's Counting?" column on ABCNews.com usually appears the first weekend of every month.