Silicon Insider: Visual Search Engine

Oct. 16, 2003 -- The Internet may be a landscape of words today, but its destiny is to become an ocean of images.

The trick will be in the navigation.

You don't have to be a dedicated Web surfer to understand why words (and numbers) dominate the Net. A picture may be worth a thousand words, but the data density of those thousand words is a heckuva lot lower than for that picture.

Thus, you can not only download most of a book before you can receive a single, photographic quality image, but you can also start reading the text from the start, while the image is still just a few thin stripes.

A decade ago, when transmission speeds were still measured in thousands of bits per second, the Web was all words. Those of you readers who were online in those days remember how thrilling it was to even see a picture forming on the screen — and then how frustrating it was to wait while the damn thing froze up everything else for 10 minutes while it downloaded one pixel at a time.

Of course, that is no longer the case. As bandwidth increased, so did the graphic complexity of the content. These days, the Web is a panorama — some would say a wasteland — of images, from movie stills to porn to pictures of auction items and book covers to spam. As broadband has reached America's homes in the last two years, it has brought in its train simple video, from brief movie trailers to extended cartoons.

Looking ahead, you can already predict the future: full-length downloadable movies available in near real-time (killing the video and DVD industries), online television, and do-it-yourself FX, simulations and animation.

That's the predictable future, the usual extrapolation of technology from the present, paced by Moore's Law. More bandwidth equals prettier pictures and longer movies.

Transforming the Language of the Web

But there is a second track in the history of tech. This is the long, and usually unpredictable, chain of technological discontinuities; the new inventions and products that seemingly come out of the blue and turn everything upside down.

The transistor, the integrated circuit, the microprocessor, disk memory, the calculator, the PC, desktop publishing, the router, the Web, the search engine, and others, each, in some measure, radically transforming the world around us, while creating vast new industries. In retrospect these breakout technologies may seem inevitable, but almost no one (except for their inventors, sometimes) sees them coming.

I can't help thinking that the Web itself is due for just such a discontinuity. Sure, on-demand movies and online TV are incredibly exciting; but they are nevertheless logical extensions of the present at the application level. Real transformations take place not on deck with the tourists but down in the engine room. Real tech breakthroughs take off the panel and mess with the wiring inside.

What that says to me is that the next great shift for the Internet will be in the language of the Web. By that, I mean the words, because for all the pictures and videos, the Web remains a word-driven medium.

A Digital History, Frame by Frame

For the last couple of years, I've been gathering clues that this transformation to a true Visual Web is already under way. Just this week I ran across just such a glimpse of this future.

On Monday, Reuters carried a story announcing that Britain's Independent Television News (ITN) had put on the Web all 3,500 hours, and 75 years, of British Pathe newsreels, covering every major news event from the Boer War to Swinging Sixties London.

Better yet, using digital technology, ITN had scanned and copied every frame of these 35mm films — thus producing a database of more than 12 million historic photographs. (You can visit it, even order images, at www.britishpathe.com — but wait a few days, the crush of the curious appears to have slowed it to a crawl.)

ITN bills the site as "the world's first digital news archive." I think it's more than that: it's a glimpse of what the Visual Web can be, and how far we still have to go to get there.

The power of the Pathe site is self-evident the moment you see it: just type in a term, say, "The Beatles" and up pops a score of newsreel films; the Fab Four receiving an award, in concert, returning by plane from their U.S. tour, etc. You can watch the newsreels, or step through them frame by frame (or every fifth frame, or whatever you like), pick out the one you like, then blow it up or buy it.

It's amazingly cool. What do you want to see? Titanic survivors, train wrecks, the British fleet, Marilyn Monroe? Here you go: hundreds of images of each for your inspection. They'll even sell it to you in Powerpoint format.

The Race Is On

Yet, for all of its appeal, the Pathe site is also desperately and frustratingly clunky. Why? Because even though we now have billions, and soon trillions, of images on the Web (think: every picture and movie ever made) we still have no visual grammar to find them.

Sure, on the Pathe site you can call up a newsreel showing the Rolling Stones in concert, but to get that perfect image of Brian Jones, you are going to have to search through all the frames.

Until we find that new methodology, until we develop a visual grammar for the Web, the next great transformation of the Internet will have to wait. This is the great Longitude problem of Cyberspace. Want to make a billion dollars? Then come up with the visual equivalent of Netscape Navigator, Yahoo! Search or best of all, Google.

What the world needs right now for the next stage of digital culture is a technique to rocket across the Net in search of a particular image or video clip. Perhaps the search will be prompted by a drawing or a digital photograph — and the search methodology might range from simply looking at color or form, to some incredibly sophisticated heuristic. Whatever it is, this visual search engine must be fast, intuitive and affordable by the general public.

A few years ago I came across a start-up team that had developed a rudimentary visual Google, but the team fell apart. I suspect some of the big companies are already working on this problem, as are some universities. Still, my gut tells me that the solution will come from some lone inventor with a wholly new approach to the problem.

So, the race is on. Create a visual search engine, copyright it (or better yet, patent it if you can), license it and then sit back and rake in your billion bucks — and immortality.

Oh, and if you got the idea here, be sure to send me my 10 percent commission.

Michael S. Malone, once called “the Boswell of Silicon Valley,” most recently was editor-at-large of Forbes ASAP magazine. His work as the nation’s first daily high-tech reporter at the San Jose Mercury-News sparked the writing of his critically acclaimed The Big Score: The Billion Dollar Story of Silicon Valley, which went on to become a public TV series. He has written several other highly praised business books and a novel about Silicon Valley, where he was raised.