The World Wide Web from way back when

Computer entrepreneur Brewster Kahle is building a huge archive of old pages.

Science & Technology

November 04, 2001|By NEW YORK TIMES NEWS SERVICE

Brewster Kahle never has thought small. So it's no surprise that his latest venture is as vast as the entire Internet: an archive of the World Wide Web that's at once enormous and personal and an instructive parable about what we used to call the new economy.

Having become wealthy through a variety of Internet ventures over the years, Kahle in 1996 took up collecting: Web pages -- at last count, more than 10 billion of them. His Internet Archive project takes regular snapshots of millions of pages and, until recently, stored them like photos in the attic. Last month, the 41-year-old computer scientist celebrated the archive's fifth anniversary by unveiling the Wayback Machine, a free service that makes those old pages available to anyone who can get to the Web. It is at http: / / web.archive.org, although it has been straining under the traffic.

Accessing the collection's pages is as simple as typing a Web address into a search box and selecting a date. And it's big. The archive computers currently hold some 100 terabytes of data -- compared with an estimated 20 terabytes of information in the entire Library of Congress. The archive grows by 10 terabytes a month.

But what good is it? After all, if the science fiction author Theodore Sturgeon was right when he said that "90 percent of everything is crud," what is the percentage for the blather-prone pages of the Web?

"On the Web, it's probably more," Kahle admits.

But even if you up Sturgeon's estimate to 99 percent, that leaves 1,000 gigabytes of solid gold -- and every user's definition of gold will be different.

People searching the Internet, Kahle said, "have very specific interests," and when they find the topic that interests them, "they want it in extreme detail, like steam locomotives in Georgia in the 1840s." The wonder of the Web, he said, is that it has information about steam locomotives in 1840.

Thinking big

Kahle introduced his brainchild (named, yes, for the time machine used by the pedantic dog, Peabody, and his boy, Sherman, from the "Rocky and Bullwinkle" cartoons) with a flourish at the Bancroft Library at the University of California, Berkeley, in late October.

He demonstrated it with a certified stunner: He pulled up a Web page from the White House Web site from Sept. 10, 1996, with a press release about President Clinton proclaiming the prevention of hijacking and terrorist attacks in the air a priority. Kahle said he also had used the system to read Web pages created by the Heaven's Gate suicide cult and to find a manual for a computer part that had been taken off a company's Web site in 1998.

He doesn't want to stop with Web pages. Kahle (pronounced "Kale") is inviting copyright holders for books, movies, music and more to add their creations to the mix. Ultimately, he hopes to finally deliver the kind of library that the ancients tried to create in Alexandria.

"We have the technology to make that

that, we have the technology to give people access from anywhere in the world."

Kahle has talked this way before -- specifically, when he kicked off one of his Internet companies, Alexa, now a subsidiary of Amazon.com. That company also tried to archive the Net. In fact, its technology was used to build the Internet Archive. But attempts to profit from the venture were unsuccessful, Kahle admitted; even Amazon has stopped putting money into it. "This year, Amazon doesn't have any spare money to do services like Alexa," he said.

But there is a big difference between having a good idea and being able to make money on it, and the fact that you can't make something pay does not necessarily mean it is dumb.

Building free libraries is a noble effort, Kahle says, citing the largesse of Andrew Carnegie in improving the nation's literacy through a system of libraries. "People ask, 'How are you going to profit from this?' " he said. "We're not. It's a library. It's worth it to spend millions of dollars to build a library that doesn't cost users a penny."

New excitement

People who make libraries and archives their lives are excited about the new collection.

"Isn't it the coolest thing around?" said Christopher A. Lee, chairman of the Electronic Records Section of the Society of American Archivists. He suggested that social historians of the future might use the archive to focus on things that today seem mundane or even inane. "A lot of social historians would say a Web site that says, 'Here's a picture of me, here's a little about my cat' tells us so many important things about how people were using the Internet at a particular point in time."

The project has spurred a kind of enthusiasm that hasn't been seen in a while in the down-hearted technology world. Lawrence Lessig, a Stanford University law professor who seeks to explain the interplay of technology and society, was uncharacteristically ebullient. "My brand is pessimism," he said. "This is not something to be pessimistic about. Brewster is my hero."

Baltimore Sun Articles
|
|
|
Please note the green-lined linked article text has been applied commercially without any involvement from our newsroom editors, reporters or any other editorial staff.