Specifically, it said that the Pennsylvania Breach of Personal Information Act requires us to notify a person if we disclose personal information.
Personal Information is defined as "the first name or initial and last name in combination with one or more of the following nonpublic unencrypted pieces of information: a Social Security number, a driver's license number or state identification card, financial account number, credit card or debit card number accompanied by the applicable passwords or security codes."
This is a laudable policy, but almost immediately after reading this e-mail, I read about a contest that the DVD movie rental company Netflix conducted over the last few years.
The contest was intended to elicit from the general public algorithms that would enable the company to improve its suggestions for future selections. To do this, Netflix released a huge trove of data about users' picks of past movies and their ratings of these movies.
Since it wanted a better way of determining other movies these users might like or dislike, the company announced a $1 million prize. The prize would be awarded to that group of researchers whose predictions about a different trove of movie ratings data involving these same users were most accurate.
The users were anonymous, identified only by number. No names, Social Security numbers, drivers' license numbers or financial account figures were released, so the contest complied with Pennsylvania's policy on privacy, undoubtedly a common one across the country.
Netflix also took other measures to anonymize the information, but this did not prevent the company from being sued recently as part of a class action suit by a subscriber for violation of her privacy. An unnamed, in-the-closet lesbian mother has alleged that, by not adequately anonymizing the data set, Netflix outed her and thereby caused her economic and psychological harm.
The explanation: It turns out that people were able to identify specific users by matching their Netflix reviews and ratings with some signed ones the users had posted on the Internet Movie Database.
Nevertheless, the lawsuit maintains that in releasing the large data set Neflix had violated the very strict Video Privacy Protection Act, which was passed when the movie choices of Supreme Court nominee Robert Bork were obtained from a video store.
The $1 million winner was picked in the first contest, and now Netflix is said to be planning a second contest to further improve its prediction algorithms.
In this contest, it will provide users' individual ratings of movies but anonymize the users by "only" providing their birth dates, zip codes, and genders. "Only" is in quotes since this information is even more revealing than that released in the earlier contest.
A look at the numbers hints at why. If, as a first approximation, we assume that people live to age 75, then we have about 27,375 (75 x 365) possible birth dates. Since there are approximately 43,000 5-digit zip codes, and 2 genders, the so-called multiplication principle says that there are about 27,000 x 43,000 x 2 possible sets of birth dates, zip codes, and genders.