A look at the numbers hints at why. If, as a first approximation, we assume that people live to age 75, then we have about 27,375 (75 x 365) possible birth dates. Since there are approximately 43,000 5-digit zip codes, and 2 genders, the so-called multiplication principle says that there are about 27,000 x 43,000 x 2 possible sets of birth dates, zip codes, and genders.
This product equals about 2.3 billion, a number far greater than the 300 million population of the U.S. Since the number is so much greater than the population, it's not surprising that many Americans are uniquely defined by their birth date, zip code and gender.
Think of 2.3 billion baskets, each with a different set of these three numbers printed on the side. Further imagine that each of the 300 million Americans is placed in the appropriate basket. Surely many Americans will find themselves alone in their very own basket and thus uniquely identified.
Of course, this is a great simplification. Birth dates are not evenly distributed throughout the last 75 years, some zip codes contain a lot of people, others not many. Some contain a disproportionate share of young people, others of old people, and so on. Even if the age rather than the birth date were revealed, many Americans still would be uniquely identified.
Still, one can check empirically using census data or make a priori probability arguments to conclude that a substantial majority of Americans would be uniquely identified by the proposed contest.
The point is that it's very difficult to release any information about a person or group that won't, to a sufficiently curious and diligent researcher or advertiser, sometimes reveal private aspects of that person. Bits of information are rarely orphans and are becoming increasingly linked in unpredictable ways.
Consider the intricate interconnections of Twitter World or the Hall of Mirrors that is Facebook. Even a couple's unusual pair of names (say Waldo and Gertrude) might be enough for a savvy sleuth to uncover all sorts of information about them.
Of course, the difficulty of preserving our privacy isn't an argument for not taking reasonable precautions to do so. Unlike the first contest, the second one, should Netflix go through with it as it's rumored to be structured, would seem to be an intentional violation of privacy.
In any case, the issue of coming up with better predictions of what customers would like is a general one. Although Amazon has not outsourced its algorithms, for example, it is naturally very interested in suggesting books a particular reader would like based on his or her past choices.
This brings me to an anecdote about a bookstore I visited years ago. I asked the clerk if he knew where Wittgenstein's Tractatus might be located, and he pointed me to the automotive section where, sure enough, there it was. The problem was that Wittgenstein's book is a seminal book in 20th century philosophy.
Conclusions: Recommending books and movies is a risky business, knowing what books or movies a person likes is often quite revealing, and safeguarding people's privacy will become increasingly difficult, even with the best of intentions.