A claim of prejudicial treatment is the basis for many news stories. The percentage of African-American students at elite universities, the ratio of Hispanic representatives in legislatures, and, just recently, the proportion of women among Wikipedia contributors have all been written about extensively.
Oddly enough, the shape of normal bell-shaped (and other) statistical curves sometimes has unexpected consequences for such situations. This is because even a small divergence between the averages of different population groups is accentuated at the extreme ends of these curves, and these extremes, as a result, often receive a lot of attention.
There are policy inferences, most of them wrong-headed, that have been drawn from this fact, but I certainly don't want to delve into questionable claims. I merely want to clarify a couple of mathematical points.
To illustrate one such point, assume two population groups vary along some dimension - height, for example. Let's also assume that the two groups' heights vary in a normal or bell- shaped manner. Then even if the average height of one group is only slightly greater than the average height of the other, people from the taller group will make up a large majority of the very tall.
Likewise, people from the shorter group will make up a large majority of the very short. This is true even though the bulk of the people from both groups are of roughly average stature. So if group A has a mean height of 5'8" and group B a mean height of 5'7", then (depending on the variability of the heights) perhaps 90% or more of those over 6'3" will be from group A. In general, any differences between two groups will always be greatly accentuated at the extremes.
Let me illustrate with another somewhat idealized case. Many people submit their job applications to a large corporation. Some of these people are Mexican and some are Korean, and the corporation, perhaps unwisely, uses a single test to determine which jobs to offer to whom.
For whatever reasons, let's assume that although the scores of both groups are normally distributed with similar variability, those of the Mexican applicants are only slightly lower on average than those of the Korean applicants. (I chose the direction of the difference at random. The same point holds if the Mexicans' scores are slightly higher.)
The corporation's personnel officer notes the relatively small differences between the groups' means and observes with satisfaction that the many mid-level positions are occupied by both Mexicans and Koreans.
She is puzzled, however, by the preponderance of Koreans assigned to the relatively few top jobs, those requiring an exceedingly high score on the qualifying test. The personnel officer does further research and discovers that most holders of the comparably few bottom jobs, assigned to applicants because of their very low scores on the qualifying test, are Mexican.
She may suspect bias, but the result might just as well be an unforeseen consequence of the way the normal distribution works. In fact, a paradoxical situation would result if she lowered the threshold for entrance to the midlevel jobs: by doing so she would actually end up increasing the percentage of Mexicans in the bottom category.
Groups differ in history, interests, cultural values, and along a whole host of dimensions that are impossible to disentangle. Confronted with these social and historical dissimilarities, we shouldn't be astonished that members' scores on some standardized test are also likely to differ a bit in the mean and much more at the extremes. (Much of the discussion is valid even if the distribution is not the normal bell-shaped one.) Such statistical disparities are not necessarily evidence of racism or ethnic prejudice although, without doubt, they often are. One can and should debate whether the tests in question are appropriate for the purpose at hand, but one shouldn't be surprised when normal curves behave normally.
To combat these disparities, strict quotas are sometimes promoted, but aside from having a dubious and occasionally illegal rationale, such schemes are impossible to implement. Another thought experiment, albeit unrealistic, illustrates this.
Imagine a company, PC Industries say, operating in a community that is 25% black, 75% white, and 5% homosexual, 95% heterosexual. (Again, I've plucked these numbers out of the air.) Unknown to PCI and the community in general is the fact that only 2% of the blacks are homosexual, whereas 6% of the whites are. Making a concerted attempt to assemble a workforce of 1,000 which "fairly" reflects the community, the company hires 750 whites and 250 blacks. However, just 5 of the blacks (or 2%) would be homosexual, whereas 45 of the whites (or 6%) would be (totaling 50, 5% of all workers).
Despite these efforts, the company could still conceivably be accused by its black employees of being homophobic since only 2% of the black employees (5 of 250) would be homosexual, not the community-wide 5%. The company's homosexual employees could likewise claim that the company was racist since only 10% of their members (5 of 50) would be black, not the community-wide 25%. White heterosexuals would certainly make similar complaints.
To complete the reductio ad absurdum, we can factor in other groups - Hispanics, women, surgeons, professors, handicapped people, Norwegians, whoever. Their memberships will also intersect to various unknown degrees, and their backgrounds and training are quite unlikely to be uniform. Once again, statistical disparities will necessarily result.
Sadly, racism and homophobia and all other forms of group hatreds are real enough without making them our unthinking first inference when confronted with such disparities.
John Allen Paulos, a professor of mathematics at Temple University in Philadelphia, is the author of the best-sellers "Innumeracy" and "A Mathematician Reads the Newspaper," as well as, most recently, "Irreligion." He's on Twitter and his "Who's Counting?" column on ABCNews.com usually appears the first weekend of every month.