Living by the Numbers: Big Data Knows What Your Future Holds
They can predict your next purchase, forecast car thefts.
May 18, 2013 -- Forget Big Brother. Companies and countries are discovering that algorithms programmed to scour vast quantities of data can be much more powerful. They can predict your next purchase, forecast car thefts and maybe even help cure cancer. But there is a down side.
On balmy spring evenings, Hamburg's Köhlbrand Bridge offers an idyllic postcard view of the city's harbor. The Elbe River shimmers in the reddish glow of sunset, forklifts, cranes and trucks seem to move in slow motion, and occasionally a container ship glides by. But from the standpoint of Sebastian Saxe, the area is primarily an equation with many variables. For the past four-and-a-half years, the 57-year-old mathematician has been working on his trickiest computing task yet at the behest of the company that manages the Hamburg port.
The port covers an area of 7,200 hectares (about 28 square miles). Roughly 200 trains a day traverse its 300-kilometer (186-mile) network of rails and its 130 bridges to transport goods that have arrived by ship. Saxe, as chief information officer of the Hamburg Port Authority (HPA), faces the enormous task of optimizing this logistical nightmare.
The amount of land is finite, and further expansion is not possible. Nevertheless, the Hamburg Senate has announced its goal of almost tripling container transshipment volumes in the city by 2025. This will only work if Saxe and his 60-member IT team manage to optimally exploit another resource: data. He certainly has plenty of it.
The port is already filled with sensors today. Trucks and freight trains are constantly transmitting their positions while incoming container ships report their location and speed. Sensors that constantly monitor port traffic are built into the Köhlbrand Bridge.
"Our goal is a totally interconnected, intelligent port, a Smart-port," says Saxe. He envisions a port in which, for example, a railroad drawbridge would no longer open at specific times, but rather just before a ship actually reaches it. This eliminates unnecessary delays for the railroad and at the terminal. Even the Köhlbrand Bridge would become "intelligent," in that it would report its current condition and predict future maintenance needs, all through the use of sensors. The frequency of scheduled maintenance dates was recently increased because significantly larger numbers of heavy trucks were crossing the bridge than had been planned for. This was of interest to Saxe and the HPA, but also to police and the customs agency, because some of the trucks were carrying illegal loads.
Rediscovering Data
In the end, the complex harbor logistics will create a machine that controls itself. Saxe's vision of the future is a sort of port exchange, allowing shipping companies to predict, down to the minute, how quickly their containers will be moved from the water to the road.
Many other companies worldwide are in the same position as the HPA. They are rediscovering a raw material that they, their facilities and their customers produce in excess every day: data.
The expression "Big Brother" has become dated. Experts would seem to have reached consensus on the term "Big Data" to describe the new favorite topic of discussion in boardrooms, at conventions like Berlin's re:publica last week, and in a number of new books. Big Data promises both total control and the logical management of our future in all aspects of life. Authors like Oxford Professor Victor Mayer-Schönberger are calling it a "revolution." According to Mayer-Schönberger, Big Data, which is also the title of his current book on the subject, will change our working environment and even the way we think.
The most important factor is not the sheer volume of data, even though it is currently growing faster than ever. An estimated 2.8 zettabytes of data were created in 2012. One zettabyte is 1,000,000,000,000,000,000 kilobytes. Experts predict that the volume of new data could increase to 40 zettabytes by 2020. It would take about 250 million DVDs to store the amount of data being transmitted on the Internet in a single day. This volume doubles about once every two years.
New is the way companies, government agencies and scientists are now beginning to interpret and analyze their data resources. Because storage space costs almost nothing nowadays, computers, which are getting faster and faster, can link and correlate a wide variety of data around the clock. Algorithms are what create order from this chaos. They dig through, discovering previously unknown patterns and promptly revealing new relationships, insights and business models.
Though the term Big Data means very little to most people, the power of algorithms is already everywhere. Credit card companies can quickly recognize unusual usage patterns, and hence automatically warn cardholders when large sums are suddenly being charged to their cards in places where they have never been. Energy companies use weather data analyses to pinpoint the ideal locations for wind turbines down to the last meter. According to official figures, since the Swedish capital Stockholm began using algorithms to manage traffic, drive times through the city's downtown area have been cut in half and emissions reduced by 10 percent. Online merchants have recently started using the analyses to optimize their selling strategies. The widespread phrase "Customers who bought this item also bought …" is only one example of the approach.
Turning Data into Dollars
Google and Facebook are pure, unadulterated Big Data. Their business models are based on collecting, analyzing and marketing information about their users, through advertising tailored as closely as possible to the individual. This gigantic database and the notion of what can be done with more than a billion individual profiles in the age of Big Data was worth at least $100 billion (€78 billion) to Facebook investors.
The prospect of turning their treasure troves of data into dollars is now fueling the fantasies of businesses in many industries, from supermarkets to the automobile industry, and from aviation to banks and insurance companies. According to figures published by industry association Bitkom, global sales related to Big Data applications amounted to €4.6 billion in 2012. That number is expected to increase to about €16 billion by 2016.
Countless Big Data applications are also being tested in medicine and science. Even the public sector, especially police departments and security agencies, not always the most progressive when it comes to IT, have recognized the potential benefits in their fields.
What captivates so many people is the promise of gazing into the future, thanks to the lightning speed at which massive amounts of data can be analyzed. In fact, algorithms allow for astonishingly precise predictions of human behavior, be it in front of supermarket shelves, in traffic or when it comes to credit-card payment patterns.
In 2010, Google predicted a wave of flu outbreaks on the basis of user searches. American data specialist Nate Silver predicted the outcome of the last US presidential election well in advance and more precisely than all demographers.
'The End of Chance'
Some cities even predict the probability of crimes in certain neighborhoods. The method, known as "predictive policing," seems like something straight out of a Hollywood film, and in fact it is. In Steven Spielberg's "Minority Report," perpetrators were arrested for crimes they hadn't even committed yet.
Finding the presumed delinquents also doesn't seem to present a problem. Scientists have figured out that, with the help of our mobile phone geolocation and address book data, they can predict with some certainty where we will be tomorrow or at a certain time a year from now.
The increasing accuracy of such forecasts have led American tech guru Chris Anderson to proclaim that we are arriving at the "end of theory." Austrian media executive Rudi Klausnitzer, who has just written a book on the subject called "Das Ende des Zufalls" ("The End of Chance"), has reached a similar conclusion.
It is a prospect that is not altogether appealing to some. But many already rely on the prognostic ability of soulless algorithms in the most intimate spheres of life. The extensive questionnaires used by online dating agencies are fed into algorithms designed to increase the probability of finding a compatible partner.
A gold rush of sorts is taking shape in companies, research laboratories and some government agencies. In many places, the mantra of data is extolled as the new "oil" or "gold" of the 21st century. Some people are already benefiting financially: statisticians, physicists and so-called data scientists or data miners, who advise companies on Big Data applications. As with the classic American gold rush in the 19th century, most of the money is being made by those who sell equipment, tools and expertise, Big Data specialists like Blue Yonder, a company with 85 employees.
How Data Revolutionizes the EconomyThe man doesn't look much like a fortune-teller, and yet he repeatedly makes the same odd remark: "Our job is to provide predictions of all kinds." Uwe Weiss is the managing director of Blue Yonder, a company founded five years ago. Weiss doesn't consult tarot cards or scrutinize the entrails of innocent farm animals, but instead analyzes the columns of figures generated by supermarket cash registers, weather services, vacation schedules and traffic reports. All of this data flows into data analysis software developed by Blue Yonder, which, according to the company's advertising, can be used to provide customers like the Otto Group, one of the largest mail-order companies in the world, and the dm drugstore chain with "precise prognoses" -- on, for example, the expected sales of a specific item.
This is extremely important to the retail business because it enables companies to avoid delivery problems and minimized storage costs. Blue Yonder software is programmed to learn more with each piece of new information it acquires, as well as to independently recognize relationships.
In this manner, Weiss and his employees discovered that in a specific branch of a supermarket chain, sales of milk, chocolate bars and apples shot up on certain days -- coinciding with the arrival of new school groups at the nearby youth hostel. The software now calculates, using data that includes school vacation schedules in the surrounding states, the probability of new busloads of students arriving at a given time.
Blue Yonder employees had a similar realization with sliced bread. "Children don't go to school on days preceding or following midweek holidays, and the demand for sliced bread goes down as a result," says Weiss. Inventory ordering systems are now adjusted automatically, he explains. "It's a relatively straightforward process."
Increasing Accuracy of Sales ForecastsThe constant in-stream of new data has enabled Blue Yonder to develop something of an ad hoc market research system on buying behavior, which can also be used for other purposes. The drugstore chain dm has Weiss's team calculate optimal staffing levels for its stores, and it also provides sales forecasts for each store.
The data analyses are of similar interest to insurance companies. As a "future scenario," Weiss describes a car equipped with more than 1,000 sensors, which permanently monitors driving behavior. Drivers who provide their insurance companies with the data, which can easily be used to develop a risk profile, could in the future be enticed with especially low premiums, says Weiss. "Big Data is currently revamping our entire economy, and we are only at the beginning," says the head of Blue Yonder, for whom, of course, this is a positive development.
Important Big Data customers are also reporting the first measurable successes. A study IBM has compiled on a few "success stories" among its own customers reports increases in efficiency of about 20 percent. According to Blue Yonder customer Otto, the data experts' work has improved "the quality of sales forecasts for individual items by 20 to 40 percent." The mail order company is so enthusiastic that it is now using the method with corporate brands like German sporting goods retailer SportScheck, as well as acquiring a 50 percent stake in Blue Yonder.
Netflix, an American company that began its business with DVD rentals and now provides its 36 million customers with movies primarily through streaming video feed, is yet another example of the far-reaching possibilities of Big Data. Netflix recently achieved record viewership levels with the series "House of Cards," a political thriller starring Kevin Spacey. The show's success, though, was hardly by chance: Netflix used data analysis in its decision to buy the series. Netflix has the ideal qualifications to take such an approach. The company knows, on a day-by-day basis, which genres are doing well, when viewers are losing interest or which actors are especially popular. Based on such information, "House of Cards" corresponded precisely to the predicted tastes of Netflix viewers, and proved a success.
The music-streaming portal Spotify is popular for similar reasons. It provides participating record companies with current information about music tastes and usage behavior, and it allows bands to plan upcoming tours by choosing locations where Spotify users listen to their songs most frequently.An Electronic Brain to Defeat CancerBut Big Data also promises to benefit society in other ways. Hope for millions of cancer patients can be found on the second floor of the Hasso Plattner Institute (HPI), in Babelsberg, a district of Postdam outside of Berlin. It consists of a rack with 25 slots, each with blinking diodes. Each of the computers has 40 processors instead of only one. The room is kept at low temperatures to prevent the €1.5 million brain, with its 1,000 processor cores, from overheating.
Plattner, founder of the SAP software group and sponsor of the institute, was personally involved in the development of the idea, which seems relatively straightforward. The Babelsberg computing machine sucks all data directly into its main memory, which enables it to compute at 1,000 times the speed of conventional computers, and sometimes even faster.
The process, which began at HPI as a project by eight undergraduate students with the working title "Sanssouci DB," is known internationally by the name "in-memory." It has won prizes for innovation and has become part of SAP's portfolio. The company's current Hana database technology is based on the in-memory process. The head of HPI, mathematician Christoph Meinel, sees the technology as a foundation for both commercial applications and opportunities for cancer therapy. "Thanks to the in-memory process, we are on the threshold of personalized medicine," says Meinel.
It currently takes months to decode a person's genome in order to come up with a treatment tailored to an individual patient, Meinel explains. This isn't surprising, given the roughly three billion genetic building blocks in a person's DNA. But now scientists know that every tumor is different, which means that the same treatment can affect people in different ways.
Triggering the Alarm
With the help of his new "super brain," the decoding of an individual genome can be reduced to a few seconds, says Meinel. In addition, the Babelsberg computer spends its nights extracting all information from publicly accessible genome databases, searching the data for comparable cases to find treatments that resulted in high survival rates and the best possible quality of life. "Until recently, this matching process would have taken months," says Meinel.
Meanwhile, researchers at the University of Manchester are working on another Big Data project, a "magic carpet" that could help senior citizens who live alone. The device is installed on the floor like an ordinary carpet, with built-in sensors recording the person's steps. The data enables a computer to determine whether the person has gotten out of bed, for example, and can analyze activities to see how they compare with the person's normal movement pattern. Deviations could indicate a medical emergency, and an alarm is triggered.
The carpet sensors can detect the tiniest of differences in a person's step, and thus indicate that something isn't right with the patient before a possible fall. Scientists are also experimenting with other logical extensions of the concept. For instance, chips and sensors with these kinds of alarm functions can also be incorporated into artificial hips.The Algorithm BuilderStefan Henss, a student, had initially hoped to earn a little extra cash by betting on sporting events. Hoping to outsmart the bookies, he wrote a program that was supposed to precisely predict football scores. But it didn't work very well.
Henss was 10 when he got his first computer, and he started programming at 13. Three years ago, when he was in his early 20s and studying at the Technical University of Darmstadt, he happened upon Kaggle, a platform where companies tender data projects. The companies are interested in obtaining the most precise predictions possible, as well as solving difficult problems for which they are unable to find solutions on their own.
Henss chose a task that had been posted by a car dealership platform, which was searching for a way to predict the resale prospects of used cars. He built an algorithm, in which he inserted a large number of details about the cars "into a context that made sense," as he describes it. The information included data such as original registration dates, mileage and annual distance driven.
He submitted his solution and came in sixth among the 571 teams from around the world competing for the $10,000 award. The challenge had awakened his ambition. "Competing with others was incredibly motivating. I knew that I was onto something," says Henss, who has since become one of the most successful algorithm builders on Kaggle. Nowadays he chooses his challenges more strategically, partly based on the amount of the award.Computer Grading
The approach led him to his biggest triumph to date. The challenge was to write a program that could automatically and reliably evaluate student essays -- essentially a grading machine. Using various standard algorithms, Henss built a program that takes the wealth of language into account, determines the number of spelling errors per word and recognizes grammatical errors. The program can draw conclusions on the content of essays. His algorithm can even detect how levelheaded the writer was or whether emotions were at play.
He spent a month and a half working exclusively on the program, which eventually consisted of 12,000 lines of code. A week before the end of the contest, he joined forces with two other competitors to increase his chances of winning. His new partners, an Englishman and an American who only knew each other through the platform, combined their solutions. They won the contest and split the prize money of $60,000.
"Tests have shown that our evaluations did not differ significantly from the teacher evaluations," says Henss. The trio has since sold the software to Pacific Metrics, a US company. Henss, who is now writing his master's thesis, can look forward to a bright future.
But there are also people whose lives are made more difficult by Big Data applications.
The End of Inspector ChanceThings couldn't have gone more wrong for the car thief. As he was trying to break into a vehicle in an underground parking garage in Santa Cruz, California, a policeman happened to be sitting in an unmarked car just a few meters away, eating his lunch. The thief was arrested before he could complete his nefarious deed.
But the officer wasn't in the right place at the right time purely by chance. He was spending his lunch hour in the parking garage on that particular day, based on the recommendation of a computer program.
For the last two years, the city's roughly 100 police officers prepare for their shifts every day with instructions from both their supervisors and an algorithm. The program, which is constantly being fed all relevant data the police apparatus has to offer, calculates the probabilities of crimes, like burglary, robbery and car theft, being committed at certain times and in certain neighborhoods. Homicide has been excluded from the program so far.
The 15 most dangerous neighborhoods appear as rectangles on electronic cards. In two-thirds of cases, the predicted incidents actually occurred. "I would have been happy with only 10 percent," says Santa Cruz Deputy Police Chief Steve Clark.
Two professors, computer scientist George Mohler and anthropologist Jeffrey Brantingham, who specializes in crime scenarios, were instrumental in developing the predictive method of fighting crime. Their program is based on models for predicting the aftershocks of earthquakes.
Clark had accidentally heard about the two professors' idea in early 2011. Together the three men set up a pilot project. They fed eight years of crime statistics into the program, as well as countless other pieces of potentially relevant data, like weather statistics and proximity to parks and bus routes. In addition, the program places each crime in relation to every other crime.
High-Tech Cops
"There were many skeptics at first, including me," says Clark. "But the numbers speak for themselves: It works." According to Clark, burglaries declined by 11 percent and car theft was down by 8 percent in Santa Cruz after the new crime prediction system had been in use for one year. In addition, the arrest rate in Santa Cruz went up considerably -- by 56 percent.
The entire police force now uses high-tech equipment. All cops carry smartphones and tablet computers to access the web-based prediction program while on patrol. They are encouraged to spend time in the marked zones whenever possible. Clark can tell many stories about how his officers have caught burglars and thieves red-handed in the predicted zones.
The two data experts, Mohler and Brantingham, have since started a company and are marketing their product, Predictive Policing, worldwide. In the United States alone, more than a dozen police departments have already introduced the software, including Los Angeles, Boston and Chicago. Clark was recently in England to help the police in Kent launch the program.
The military and intelligence communities also employ the power of data analysis. For instance, Big Data played a key role in the hunt for Osama bin Laden, as American author Mark Bowden describes in his book "The Finish." According to Bowden, database analyses were partly what ultimately led investigators to Abbottabad in Pakistan.
Improving Security
One of the suppliers to the US Defense Department is Palantir, a company founded in 2004, partly with backing from Peter Thiel, a German-American investor. Another hot commodity among intelligence agents and military officials is the California software company Splunk, which is headquartered in a former sausage factory in San Francisco. A few weeks ago, technology journalists named Splunk one of the five most innovative companies in the world. Google only made it to 11th place in the ranking.
Governments, agencies and businesses in more than 90 countries use Splunk's applications. Its customers in the US include the Pentagon and the Department of Homeland Security. The nine-year-old company's software analyzes and interprets data supplied by all kinds of machines, including cell phone towers, air-conditioners, web servers and airplanes.
"The turbines of an Airbus A380 produce as much data during a single flight as a medium-sized computer center," says Guido Schröder, senior vice president of products at Splunk.
Schröder, who is originally from Paderborn in northwestern Germany and spent many years at SAP, now supervisors 160 developers and engineers, whose goal is to make the flood of data produced by machines usable. In the case of the A380, for example, Splunk's work can help airlines minimize kerosene consumption or optimize maintenance intervals.
"Security is one of the biggest growth areas for Big Data applications," says Schröder. In addition to crime and terrorism, Splunk focuses on the growing number of attacks in, and by means of, the Internet and its software can detect hacker attacks or other cyber attacks. "We are positioning ourselves for an expanding cyber war," Schröder says. But the data hunters' new war also has many civilian aspects.The DatabaseThe brick building in Hamburg's Winterhude neighborhood doesn't look like a bank branch, and yet loans are made there. A plastic banner identifies the company as "Kreditech," and the atmosphere inside the building is a mixture of garage startup and shared apartment. Signs are posted on the walls exhorting employees to "Take off your shoes."
Sebastian Diemer and Alexander Graubner-Müller bear no resemblance to typical German lenders, whose business model they consider to be outdated. "At least the classic branch bank model is on its way out," says the young and dynamic Diemer, dressed in the classic start-up look: jeans and a hoodie.
The self-confident founders of Kreditech lend money through the Internet: short-term mini-loans of up to €500, with the average customer receiving €109. Instead of requiring credit information from their customers, they determine the probability of default on their own, using a social scoring method that consists of high-speed data analysis. "Ideally, the money should be in customers' accounts within 15 minutes of approval. This is already working in Poland," says Diemer. In return, Kreditech wants as much data as possible from its users. The more information the company gets, the more precise are its predictions and the higher a customer's potential credit line.
The evaluation profiles of EBay accounts are already publicly accessible. Kreditech also requires access to Facebook profiles, so that it can verify whether a user's photo and location match information on other social networking sites, like Xing and LinkedIn -- and whether his or her friends include many with similar education levels or many colleagues working in the same company.
All of this increases the likelihood that Kreditech is dealing with a real person. Even the question of whether the loan request was submitted from an expensive iPad or a cheap Aldi computer goes into the evaluation. The applicants' behavior also plays a role, such as how much time it takes them to fill out the questionnaire. Kreditech also records the frequency of errors and use of the delete key.
Loans on Faith
In this manner, the Kreditech algorithm can process up to 8,000 different pieces of information, say its creators, who founded the company in March 2012 and are expanding rapidly. Kreditech is currently online in Poland, Spain and the Czech Republic, and the Hamburg-based lender also plans to launch its business in Russia in the coming days.
The lending operation had only been available in Germany for three weeks when Kreditech terminated its service here, "for preventive reasons," as Diemer says. In fact, the Federal Financial Supervisory Authority (BaFin) had contacted the company and announced its intention to examine its business model. That's because Kreditech doesn't just charge high interest rates, but also requires its customers to pay up to €49 for a "certificate of creditworthiness."
The mandatory purchase of certificates has since been eliminated in its other markets, say Kreditech's creators. The amount of interest they charge ranges from 5 to 28 percent a month, depending on the loan amount, a customer's score and the terms of the loan.
Kreditech's founders don't just expect to generate substantial revenues with microcredit requests and interest income. Their real goal is to develop an international, self-updating creditworthiness database for other companies, such as online retailers. Current scoring methods are based on fewer parameters, which also only reflect a person's credit past, says Diemer. Even that doesn't exist in many countries. A credit bureau with market penetration similar to the German system exists only in a few markets. "For almost three-quarters of the global population, there is still no reliable information on creditworthiness," says Graubner-Müller.Investor Interest
In addition to Kreditech, lenders like Zestfinance and Britain's Wonga are pursuing similar goals as they dabble in this precarious market, raising both legal and moral issues. Wonga was maligned in the British press when it tried to lure students away from government-backed student loans into its own, vastly more expensive loans.
The Kreditech founders, who have known each other since adolescence in the western city of Wiesbaden, insist that they are above reproach when it comes to privacy issues and the amount of interest they charge. "SCHUFA (the German credit agency) keeps data in storage, while we only use data for a specific request." Besides, they explain, almost all of the data relating to rejected applicants is deleted after 90 days. The company only retains the information it needs to recognize applicants who were previously turned down.
Despite the limitations, investors find social scoring to be extremely attractive. Kreditech secured $4 million in capital in December, and in April a German fund invested at a similar level. Wonga, for its part, has already raised $141 million in investor capital.
A Tyranny of Data?Business models like Kreditech's illustrate the sensitivity of the issues that many new Big Data applications raise. Users, of course, "voluntarily" relinquish their data step by step, just as we voluntarily and sometimes revealingly post private photos on Facebook or air our political views through Twitter. Everyone is ultimately a supplier of this large, new data resource, even in the analog world, where we use loyalty cards, earn miles and rent cars.
Perhaps many people do so with so little inhibition because what happens to our data often remains ambiguous. To whom and how often is our data sold? Are these buyers of data also subject to rules for deleting the data and preserving anonymity? And what will happen, for example, with Kreditech's credit profiles if the small business is ever acquired by a larger company or goes bankrupt?
An attempt by the established credit reviewers at SCHUFA to launch a pilot project on social scoring, together with the Hasso Plattner Institute, revealed just how sensitively the public reacts to such issues.
As with Kreditech, the project sought to analyze data from Facebook, Twitter and other social networks and examine its role in determining creditworthiness. Merely the announcement of the project triggered angry protests, and the effort was promptly abandoned.
There was an even greater storm of indignation when many drivers realized that their navigation devices don't just help them find the best route to their destinations, but can also be used against them. TomTom, a Dutch manufacturer of GPS navigation equipment, had sold its data to the Dutch government. It then passed on the data to the police, which used the information to set up speed traps in places where they were most likely to generate revenue -- that is, locations where especially large numbers of TomTom users were speeding.
Pre-programmed Conflicts
TomTom's CEO issued a public apology, saying that the company had believed that the government wanted the data to improve traffic safety and reduce road congestion. TomTom had not anticipated the use of the data for speed traps, he said.
Similar conflicts are practically pre-programmed into the technology, especially as a central conflict is inherent in its development. Big Data applications are especially valuable when they are personalized, as in the case of credit checks and individual medicine.
Personalized profiles, which bring together a wealth of information, from expressions of opinion on Facebook to movement profiles, provide companies with tempting possibilities. For instance, if someone "likes" a particular pair of jeans on Facebook, the storeowner could send a discount coupon for precisely the same brand of jeans to that person's mobile phone the next time he or she enters the store.
This may be appealing to retailers and some consumers, but data privacy advocates see many Big Data concepts as Big Brother scenarios of a completely new dimension.
So far, many companies have tried to dispel such fears by noting that the data they gather, store and analyze remains "anonymous." But that, as it turns out, is not entirely accurate, in that it sells the power of data analysis radically short. Take the analysis of anonymous movement profiles, for example. According to a current study by the online journal Scientific Reports, our mobility patterns are so different that that they can be used to "uniquely identify 95 percent of the individuals." The more data is in circulation and available for analysis, the more likely it is that anonymity becomes "algorithmically impossible," says Princeton computer scientist Arvind Narayanan.
In his blog, Narayanan writes that only 33 bits of information are sufficient to identify a person.
Heated Controversy
From the standpoint of businesses, the slightly schizophrenic attitude of consumers is the real crux of the issue. On the one hand, we have become shockingly forthcoming -- and apparently accessible -- online. Yet we ascribe the most sinister of motives to those who would analyze that data and collect more.
A study by New York advertising agency Ogilvy One concludes that 75 percent of respondents don't want companies to store their personal data, while almost 90 percent were opposed to companies tracking their surfing behavior on the Internet.
This conflict explains the heated nature of the current controversy over the proposed new European data protection directive. If the European Commission's plans, which also include a "right to be forgotten" on the web, become a reality, many providers could see their Big Data growth fantasies in jeopardy. This is one of the reasons Brussels currently faces a barrage of lobbying from the likes of Amazon, Google and Facebook.
But for a modern society, an even more pressing question is whether it wishes to accept everything that becomes possible in a data-driven economy. Do we want to live in a world in which algorithms predict how well a child will do in school, how suitable he or she is for a specific job -- or whether that person is at risk of becoming a criminal or developing cancer?Data Tyranny
Is it truly desirable for cultural assets like TV series or music albums to be tailored to our predicted tastes by means of data-driven analyses? What happens to creativity, intuition and the element of surprise in this totally calculated world?
Internet philosopher Evgeny Morozov warns of an impending "tyranny of algorithms" and is fundamentally critical of the ideology behind many current Big Data applications. Morozov argues that because formulas are increasingly being used in finance and, as in the case of Predictive Policing, in police work, they should be regularly reviewed by independent, qualified auditors -- if only to prevent discrimination and abuses of power.
A dominant Big Data giant once inadvertently revealed how overdue a broad social and political debate on the subject is. Google Executive Chairman Eric Schmidt says that in 2010, the company toyed with the idea of predicting stock prices by means of incoming search requests. But, he said, the idea was discarded when Google executives concluded that it was probably illegal.
He didn't, however, say that it was impossible.
Translated from the German by Christopher Sultan