July 8, 2008 -- When the next salmonella or avian flu outbreak hits, the internet will have the news first.
But good luck finding that news amid the chatter about Angelina Jolie, Tom Cruise or the newly touted benefits of watermelon.
A new website, HealthMap, addresses that challenge by siphoning up text from Google News, the World Health Organization and online discussion groups, then filtering it and boiling it down into mapped data that researchers -- and the public -- can use to track new disease outbreaks, region by region.
"There is so much information on the web about disease outbreaks but it's obscured by garbage and noise," said John Brownstein, a professor at Harvard Medical School, and co-founder of HealthMap.org. "The idea of HealthMap is to get filtered, valuable information to the public and public health community in one freely available resource."
The site's free accessibility could be particularly important in the developing world, where poor public health infrastructure and lack of money has handicapped epidemiological efforts. That's a problem because those regions are exactly where scientists predict new and dangerous diseases are likely to emerge.
HealthMap goes beyond the standard mashup and is more like a small-scale implementation of the long-awaited semantic web. The site, which the researchers describe in the latest issue of open access PLoS Medicine, creates machine-readable public health information from the text indexed by Google News, World Health Organization updates and online listserv discussions.
While aimed at public health workers, HealthMap is also usable by the general public. It locates the outbreaks on a world map and creates a color-coding system that indicates the severity of an outbreak on the basis of news reportage about it. Users of the site can then analyze and visualize the data, gaining unprecedented views of disease outbreaks.
By doing it all with publicly available news sources and low operating costs, the service itself remains free. After a small-scale launch in 2006, the site's model and potential attracted a $450,000 grant last year from Google.org's Predict and Prevent Initiative, which is focused on emerging infectious diseases.
"We really like their approach in that they are trying … a really open platform," said Mark Smolinski, director of Predict and Prevent initiative at Google.org. "Anybody can go in and see what kind of health threats are showing up around the world."
Back in 2006, Google.org head Larry Brilliant told Wired.com about his vision for a service that looks a lot like HealthMap.
"I envision a kid (in Africa) getting online and finding that there is an outbreak of cholera down the street. I envision someone in Cambodia finding out that there is leprosy across the street," Brilliant said.
HealthMap doesn't have quite that level of resolution just yet -- outbreaks are only mapped to the state/province level -- but it's no standard Google Maps mashup. The back end of the system does far more than marry data points to locations.
Clark Freifeld, a software developer at Children's Hospital Boston and the technical lead on the project, said that a host of complex algorithms underpin the simple interface that the site's users see.
"It's not only what you see, but what you don't see," Freifeld said.
The site cuts out duplicate stories and other sorts of "noise" from the "signal" -- news of a disease outbreak.
For example, HealthMap can differentiate between a news story about an outbreak and a story about government vaccination or public health. By identifying the language reporters use in known outbreak stories, it can find similar new ones based on the same verbiage. For instance, key words like "mysterious" tend to pop up in outbreak stories, but not, say, in coverage of vaccine programs. Another common feature of outbreak stories is a small number in the headline, usually to denote a number of people infected or killed.
"We're looking at data visualization rather than just producing a stream of e-mails," Freifeld said, referencing how some fast-breaking public health data has been distributed.
The main HealthMap interface is a map of the globe that allows any user to see various outbreaks around the world, but the power of the site is really revealed when drilling down on an individual outbreak. The researchers' algorithms sort stories into one of six categories, so that the breaking news is presented on a dashboard with background context and related outbreaks, as shown at right.
In a study published this March in the Journal of the American Medical Informatics Association, the researchers found that their automated classification system was accurate 84 percent of the time. Algorithm improvements have pushed accuracy close to 90 percent now, according to the researchers.
While the site only has about 20,000 unique users, many of them are from the public health and research fields that are working directly on preventing and controlling disease.
Hannah Gould, an epidemiologist with the Centers for Disease Control, used the site to quickly trace reports of a recent E. coli outbreak at a major supermarket chain.
"It's a timely synthesis from many different sources into one place," Gould said. "It allows for a quick visualization of health events that you wouldn't otherwise have in other formats."
Right now, the researchers are focused on adding more sources, particularly in other languages, as well as improving their methodologies.
Freifeld and Brownstein are looking into using more social media sources, but they've encountered a problem that most internet users are already familiar with: There's too much noise.
"We have certainly explored looking at more free and noisier sources like blogs and things like Twitter," Freifeld said. "But they pose the problem of capturing a good quality signature from all that stuff."