Just How 'Transparent' Is the Obama White House? Scientists Try to Help
Computer scientists sort through reams of government data.
Nov. 16, 2010 -- Quick --who has been the most frequent visitor to the White House since the beginning of the Obama administration?
You don't know? Most of us would have no idea, despite the mountains of information the administration has posted at Data.gov, a website created in 2009 "to increase public access to high value, machine readable datasets generated by the executive branch of the federal government."
The Obama administration has promised to be, in its word, "transparent" about the workings of government. But it concedes that there is so much data -- most of it unsorted, without any context -- that even if you had heard of Data.gov, you would have terrible trouble making sense of it. It does have "machine readable datasets" on it, more than 270,000 of them.
Which brings us to James Hendler, a computer scientist at Rensselaer Polytechnic Institute in Troy, N.Y. No, he's not a frequent visitor to the White House -- more on that in a second -- but he is at a conference in Washington today, talking about how he and his team have been figuring out how to find information in all that digital noise. The administration says it is happy to have Hendler and his coleagues; it agrees the amount of material it posts can be daunting.
"The government collects data for very specific purposes," Hendler said in a telephone interview with ABC News. "You, as a citizen, could use that data for something. Maybe you want to track budgets."
But the raw stuff collected by government agencies -- from ozone monitors to the White House visitors' log -- is often in a form most users would not regard as user-friendly. The Environmental Protection Agency, for instance, listed readings from ozone meters nationwide, without their locations. And the White House listed every person cleared through security, just by name, date and home state, without including their affiliations or reasons for visiting.
Hendler and his team solved the problem by doing "mashups," a little like what musicians do when they combine parts of different songs, except they combined the mass of material from Data.gov with other lists. (The EPA does keep lists online of the locations of ozone sensors, just not in one place.)
So, it turns out that as of May, when the Rensselaer team last went through the list, the most frequent visitor to the White House was (drum roll, please) Anna Burger.
The Obama Administration Gets Help to Be 'Transparent'
Anna Burger? She was just ahead of Robert Wolf and Andrew Stern. Other names on the top-15 list were more familiar: Sens. Arlen Specter, D-Pa., Dick Durbin, D-Ill., and Christopher Dodd, D-Conn., as well as Richard Trumka of the United Mine Workers Union.
Hendler's team combined the raw list from Data.gov with other databases -- newspaper records, for instance -- and now the list of visitors makes more sense. Burger turns out to be a labor leader -- the former secretary-treasurer of the 2.2 million member Service Employees International Union, probably there to lobby a sympathetic Democratic administration. Stern is her former boss.
Wolf, the president of UBS Investment Bank, was appointed in 2009 to be a member of the president's Economic Recovery Advisory Board.
The data can be useful in other, perhaps more substantial ways; for instance, to determine whether high cigarette taxes actually discourage smoking. Hendler's answer: "It's sort of true, but there are a lot of other things at work there."
The data, using his mashup technique, show that New Jersey has the highest cigarette taxes and the highest prices per pack, but Utah has the lowest smoking rate. RPI researchers decided that taxes have an effect, but other factors loom larger, such as local regulations mandating smoke-free environments. Utah may be an unusual case, with a large number of non-smoking Mormons.
More of RPI's mashups, and the results, can be found on a website it has set up; click here to take a look.
What does all this tell us about the administration's efforts to be open? Perhaps that's a question for you.
Hendler says the Obama White House is certainly putting a lot of data out there, but "data is most useful when you have something to compare it to."
That's what he and his colleagues say they are trying to add.
"The longer-term goal," he said, "is to turn data into a discussion between government and its citizens."