June 15, 2007 — -- Imagine this: A traveler uses a camera phone to snap a photo of a strange-looking statue in an old hotel, and the image is uploaded to an online database. The phone's screen instantly presents him with a realistic 3-D model of the entire hotel with the statue in question highlighted and text boxes filled with information about the art.
Virtually panning the room, the phone then seamlessly displays even more information about the building, from its original architect to the price of the spicy, lime almond-crusted halibut available in the dining room.
This is the not-too-distant future as seen by the creators of Photosynth, a technology that can create photo-realistic 3-D models of any subject simply by scanning Internet image databases like Flickr, and, in the future, Google images. The user-uploaded images are linked to one another based on common attributes and then accurately posted on the surface of a 3-D model.
"What the point here is, is that we can do things with the social environment," said Blaise Aguera y Arcas of Microsoft Live Labs during a presentation of the technology at the annual TED convention this spring. "This is now taking data from everybody from the entire collective memory, visually, of what the world looks like and link[ing] all of that together. … And they make something emergent that's greater than the sum of the parts."
In other words, with programs like this one, images could help create a vibrant, visual and social network, a virtual world from the ground up, photo by photo.
In 2005, Noah Snavely, a computer-science graduate student at the University of Washington, and associate professor Steven Seitz began a collaboration with Microsoft researcher Richard Szeliski to create a more intuitive way to view photos in relation to one another.
"Computer vision has been very recently getting to the point where it's working really well," Seitz said. "At the same time, there's been an explosion of photos on the Internet. Photo sharing had begun to blossom. We wanted to see if you could reconstruct 3-D models from people's pictures on the Internet."
The result was "Photo Tourism," a part of the genesis of Photosynth that uses "vision algorithms" to identify key common attributes, such as a doorknob or a statue, and link related images together.
By maintaining the connection between images and comparing the size and angle of the images, the Photo Tourism technology can determine from what angle the picture was taken.
Finally, the images are projected into a 3-D environment at the correct distance and angle to accurately reconstruct popular tourist attractions such as the Notre Dame Cathedral.
Impressed with Photo Tourism's capabilities, Microsoft Live Labs took up the project in 2006.
Earlier that year, Microsoft had acquired another image-rendering technology, Seadragon.
Founded in 2003 by Aguera y Arcas, Seadragon Software was part of a technological movement to radically change the way high-resolution images could be viewed and stored.
Rather than completely rendering every image in view the way most image display software now does, Seadragon uses a render-as-you-go strategy, meaning a picture will only become clearer as the user zooms in on it specifically. This technique completely eliminates any limits put on the computer's processing power by the size or quality of the image. Theoretically, Seadragon's ability to store information in the form of images is boundless.
"It's really cool, really exciting," said Frédo Durand, an associate computer-science professor at MIT. "It's the culmination of evolution and revolution."
The two separate technologies, Photo Tourism and Seadragon, were then combined at Microsoft Live Labs in 2006 to create Photosynth.
Calling on Seadragon's rendering capabilities, each Photosynth can maintain an image collection of many thousands of images of a certain building or object and can seamlessly zoom in and out and transition between the images to create, as the original Photo Tourism paper proposed, a 3-D "visceral sense of presence."
"It's like a hybrid of a slideshow and a gaming experience that lets the viewer zoom in to see greater detail or zoom out for a more expansive view," said Richard Szeliski, Photo Tourism co-creator and manager of Microsoft's Interactive Media Group. "This is a revolutionary way for people to interact with photos in a 3-D context that more closely resembles the places where the images were captured."
Currently, the Photosynth Web site features a downloadable tech preview in which the user can explore a handful of 3-D environments from Piazza San Marco in Venice to the Grassi Lakes of the Canadian Rockies. But the technology is, in view of its potential applications, in its infancy.
"We thought it more important to get it out there early, though, because our road map is still wide open," says the Photosynth Web site. "We know the best ideas for how this technology might be useful may not come from us."
One idea, which is already a part of the Photo Tourism technology, but has not yet made its way to Photosynth, is the inclusion of updatable, Wikipedia-like annotations.
The 2006 paper presenting Photo Tourism explained how such a function worked: "A great deal of annotated image content of this form already exists in guidebooks, maps and Internet resources such as Wikipedia and Flickr. … A key feature of our system is the ability to transfer annotations automatically between images, so that information about an object in one image is linked to all other images that contain the same object."
While Microsoft's Live Labs has not officially announced any upcoming versions of Photosynth that will include annotations, as developer Aguera y Arcas puts it, adding annotations to photos is "not exactly a secret sauce."
Unlike the relatively sparse annotations provided by mass-mapping programs like Google Earth and Microsoft's Virtual Earth, the amount of information in user-provided annotations that could be implemented with Photosynth is up to the user.
Snavely, the co-creator of Photo Tourism, sees the most practical applications in the commercial arena.
"It could create a showcase for a new kind of medium. It could include things like showing information on business, hotels and restaurants," said Snavely. "It would also be a very fun project to work on. People can work on building a virtual model of the world."
"This is something that grows in complexity as people use it and whose benefits become greater and greater," said Aguera y Arcas during the spring TED convention demonstration.
Building a virtual model of the world is far from an original idea, but the way by which Photosynth has begun to go about it is very different from the strategy taken by recent world-creators Google Maps, Google Earth and Microsoft's Virtual Earth.
Google Map's newest feature, Street View, used camera vans and photo crews to systematically tour several major cities to allow users to view a photo-realistic panorama from any point on the streets. While far superior in efficiency and sheer volume, compared to Photosynth's approach, it is also quite limited.
"The kinds of experiences they give you are very different," said Photo Tourism co-creator Seitz. "Street Views is used as an extension of maps. … Right now you're restricted. You can't go onto the grounds of the Vatican or around Notre Dame."
In contrast to mass mapping's use of satellite imaging and mobile camera crews, Aguera y Arcas described Photosynth's expansion as a more subjective, "trickle up" effect as users' virtual world is built through their own experiences from the ground up.
"A byproduct of all of that is immensely rich virtual models of every interesting part of the earth collected not just from overhead flights and from satellite images and so on, but from the collective memory," said Aguera y Arcas during the TED demonstration.
"There's something empowering, something very democratic about being in control of that," said Seitz.
While the Photosynth team is being quiet about any other projects, suggestions are coming from all angles.
"There have been some creative ideas," said Snavely. "We've been contacted by everyone from biologists, who have photos of living creatures, to archaeologists with pictures of ancient sites."
MIT's Durand imagines the technology to be of great educational value.
"Geography teachers and history teachers would love to have access to those images," he said, "especially as you add the time dimension and see how things evolve."
Already thinking along those lines, for its latest project the Photosynth team collaborated with the BBC to create 3-D models of some of the most historic locations in Britain as part of the new BBC series "How We Built Britain." So far the collection includes six landmarks including the Ely Cathedral and Trafalgar Square.
The Photosynth team is tight-lipped about when the technology will be ready for commercial use, but claims that when it is, users will be able to upload their own photos and automatically recreate 3-D versions of their vacation experiences or even perfectly recreate their own home, inside and out.
Whether it will be used for education, science or a vast visual and social network, the only thing that is certain is that as the technology develops, the virtual world will grow as the real world seems to shrink smaller and smaller.