'Big Data' disguises digital doubts

— -- Buzzwords don't come any bigger than "Big Data," which promises to reveal the secrets hidden within big blocks of data held by companies, governments and musty old archives.

But maybe Big Data has an Achilles' heel, some experts warn, despite its Big promises.

"The initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education and national security," said presidential science adviser John Holdren, announcing a $200 million effort in March by six federal agencies to uncork the power of Big Data.

Holdren compared Big Data's advent to the invention of supercomputers and the Internet. But what is Big Data really? Starting from scientists struggling to analyze massive amounts of genetic data, as they did in the Human Genome Project a decade ago, or astronomical data, such as the survey of more than 930,000 galaxies undertaken by the Sloan Digital Sky Survey, Big Data has blossomed into a constellation of computer science approaches to handling, visualizing and blending together "big" sets of data.

For example, police forces from Honolulu to New York have looked at combinations of crime tips submitted via Facebook, Twitter and text messages to identify "hotspots" for muggings and other felonies. Amazon famously tracks masses of book purchases to suggest new buys to like-minded readers. The Defense Department hopes to weave together information from a new generation of battlefield sensors at speeds 100 times faster than today using Big Data techniques.

Such efforts have blossomed in the Facebook era, where poking through troves of customer data is seen as the key to unlocking sales. In medicine, a two-day " Health Datapalooza" held this month in the nation's capital drew together federal officials, former Senate majority leader Bill Frist, R-Tenn., and Wired Magazine executive editor Thomas Goetz, to talk health data. If your gene map can be compared instantly to the genomes of millions of other folks in coming decades, for example, the hope is that medicine finely tuned to your medical needs will result.

A Science journal study last year introduced the notion of " culturomics," using Big Data — Google's millions of searchable digitized books in its case — to reveal, "linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000."

Data. Data. Data. So much of it is out there, tracked from the moment you look up a dentist on a website, take a trip through a highway tollbooth to sit in the chair, pay your bill at the reception desk and post your toothache experience afterward on Facebook. Can an ad for a toothbrush be far from your in-box?

"The only problem is that a lot of the Big Data isn't really data," says anthropologist Robert Albro of American University in Washington D.C., who studies how culture affects public policy. "It's a mash-up of all kinds of numbers that started out as data, but they don't necessarily mean anything once they have been removed from where they started out." In the social sciences, he says, researchers have learned over the last century that half the battle in any study is carefully explaining your data's origins. "Once you leave that behind, there is a risk you'll be wrong, and a risk that the decisions you make based on being wrong will affect people in negative ways."

In anthropology, one historical example of a problem comes from turn-of-the-century attempts to pigeonhole people in far-off nations into tribes or countries, using data in categories now understood as far too simplistic. That effort contributed to European countries inventing imaginary borders or people in nations such as Rwanda. Public health researchers in the 20th century pigeonholed poor people into "defective" categories, based on bogus data, during the "eugenics" movement aimed at breeding better human beings that led to the involuntary sterilization of perhaps 60,000 people nationwide by the 1960s.

More recently, University of Wisconsin-Madison, communications scholars have warned that Google's search recommendations (the list of suggested searches that pop up when you start typing a word using the popular search engine) actually bend people's perception. Looking at nanotechnology, for example, the study showed that top search suggestions over a few years turned away from business to health concerns. The search recommendations were actually steering more people to look into less-reliable nanotechnology health-issue websites, they found. "Google is shaping the reality we experience in the suggestions it makes, pointing us away from the most accurate information and towards the most popular," study lead author Dietram Scheufele told USA TODAY in 2010.

Still, what's so wrong about using Big Data to find crime hotspots or books you might like? "Nothing. There is obviously immense promise there, as long as the data is kept to uses for which its limits are understood," Albro suggests. However, he worries that since so much of the data out there start out as "market research" — likes or dislikes when it comes to buying things — that removing data from advertising-focused troves and translating it into health care or planning for new roads or "culture" will essentially turn everyone into consumers, rather than citizens, in the minds of planners.

"We can't even agree on what 'culture' is, and now we're going to have ' culturomics.' Isn't that a little ambitious?" Albro asks. "Now we have claims that tweets predicted the 'Arab Spring,' which turns out to be questionable, or can detect 'sentiment' or 'mood,' which are even fuzzier or lazier words. We need to be a little cautious here." (In their defense, the "culturomics" study authors do urge caution on folks using their approach.)

But Albro worries that the hype over Big Data is warming up for Big Disappointment down the road. Much of the criticism of Big Data heard now focuses on privacy issues: who is using your data, or whether it will really help sell stuff. "Those are useful discussions, but we really need to talk a little more deeply about data," Albro says. "It's more than 'Garbage In, Garbage Out,' it's about how we shape the digital world."