June 15, 2007 — -- Imagine this: A traveler uses a camera phone to snap a photo of a strange-looking statue in an old hotel, and the image is uploaded to an online database. The phone's screen instantly presents him with a realistic 3-D model of the entire hotel with the statue in question highlighted and text boxes filled with information about the art.
Virtually panning the room, the phone then seamlessly displays even more information about the building, from its original architect to the price of the spicy, lime almond-crusted halibut available in the dining room.
This is the not-too-distant future as seen by the creators of Photosynth, a technology that can create photo-realistic 3-D models of any subject simply by scanning Internet image databases like Flickr, and, in the future, Google images. The user-uploaded images are linked to one another based on common attributes and then accurately posted on the surface of a 3-D model.
"What the point here is, is that we can do things with the social environment," said Blaise Aguera y Arcas of Microsoft Live Labs during a presentation of the technology at the annual TED convention this spring. "This is now taking data from everybody from the entire collective memory, visually, of what the world looks like and link[ing] all of that together. … And they make something emergent that's greater than the sum of the parts."
In other words, with programs like this one, images could help create a vibrant, visual and social network, a virtual world from the ground up, photo by photo.
In 2005, Noah Snavely, a computer-science graduate student at the University of Washington, and associate professor Steven Seitz began a collaboration with Microsoft researcher Richard Szeliski to create a more intuitive way to view photos in relation to one another.
"Computer vision has been very recently getting to the point where it's working really well," Seitz said. "At the same time, there's been an explosion of photos on the Internet. Photo sharing had begun to blossom. We wanted to see if you could reconstruct 3-D models from people's pictures on the Internet."
The result was "Photo Tourism," a part of the genesis of Photosynth that uses "vision algorithms" to identify key common attributes, such as a doorknob or a statue, and link related images together.
By maintaining the connection between images and comparing the size and angle of the images, the Photo Tourism technology can determine from what angle the picture was taken.
Finally, the images are projected into a 3-D environment at the correct distance and angle to accurately reconstruct popular tourist attractions such as the Notre Dame Cathedral.