Forbes.com – Ideas & Opinions
Quentin Hardy, 10.29.09, 08:40 AM EDT
Forbes Magazine dated November 16, 2009
Brewster Kahle is a thorn in Google’s side.
Brewster Kahle is a thorn in Google's side (c) Robert Houser
The internet and Brewster Kahle have been good to each other. He made millions from his networked computing inventions and plows millions back into expanding, documenting and providing access to the World Wide Web’s digital trove–in particular, books. He sees his mission as saving the Internet from bad business motives.
“We have to have universal access to everything, just like a library,” he says. “Do we want that under a single corporation’s control? It is openness, not corporate control, that propels capitalism.”
Single corporation? That would be the octopus known as Google ( GOOG – news – people ). Google has scanned 10 million books and makes them available for free searching. But there are those who suspect that Google’s intentions are not entirely altruistic. At a minimum, says Kahle, a 49-year-old whose motto is “universal access to all knowledge,” the world needs a diversity of book digitizing sources. Google, a onetime ally, is “a company run by lawyers, always out to see what they can get away with. We need more choice and competition than they want.”
Digital libraries will shape education, creativity and our shared intellectual heritage, Kahle declares. As founder and director of the Internet Archive, Kahle has posted online digital copies of 1.7 million books, 100,000 hours of television, 200,000 video clips, 70,000 concerts and 415,000 audio recordings. All that material can be downloaded for free from the Archive’s Web site.
Kahle has been compiling the library since 1996, two years before Google was incorporated. While many philosophers talk about the promise of free universal access to knowledge, perhaps no one person has done more than Kahle to make it real.
About half the scans from Kahle’s Archive come from Google. People download the Google volumes, then upload them to Kahle’s outfit. Google has not sued him to stop that. Now Kahle seems ready to undermine the search giant himself, for the sake of free content. The cost of keeping the Archive’s 1,000 servers, mostly near San Francisco, is largely funded by libraries and foundations, some of which pay the Archive to scan their books.
On Oct. 19 Kahle released a technology, called Bookserver, that makes it possible for any author, publisher or library to offer a scanned book for free, for sale or on loan. The publisher uses Bookserver software to convert a photo of the original page into a text file, which can then be indexed or fed into a speechifier (for, say, a blind book consumer). The texts can be read on e-book devices like Amazon’s Kindle, Sony ( SNE - news - people )’s Reader, Apple ( AAPL - news - people )’s iPhone and certain laptops. Access to more devices is coming. The files, almost entirely text, are distributed directly from the source controlling the volume, not the Archive itself.
Bookserver uses a range of open source and proprietary electronic book standards, search algorithms, editing tools and libraries. The architecture, as Kahle calls it, potentially separates manufacturers of devices from control over much of the content inside them. It also preserves the idea of the lending library–if you “check out” a volume, others cannot access it in the time allowed to you. Publishers sell their books in the system using credit cards.
The lending angle may be a way to foil Google’s claim on millions of so-called “orphan” books, or texts published since 1923 that are no longer in print. These books are not out of copyright but for the most part are abandoned by their owners, giving some justification to Google’s finders-keepers approach (in which a copyright owner has to opt out to keep a book from the Google library). Kahle’s Archive doesn’t have the post-1923 orphans. But Kahle hopes libraries will use the new Bookserver technology to scan and electronically lend orphans. Kahle reasons that libraries can scan and electronically lend their orphans without violating any laws, just as they lend those volumes today.
Google scanned its books over the past several years, initially claiming rights to reproduce brief snippets of orphans and other texts under the same “fair use” rule that allows book reviewers to quote from a book without permission. At first most authors and publishers were happy that someone was taking the printed past into the digital future. Then they started fretting about who would get rich off this trove. Google now plans to offer its whole library of scanned texts on a rental basis to libraries and in some cases sell individual volumes.
Under threat of lawsuits that could shut down the business, Google’s solution was to work with a handful of publishers and authors in creation of a Book Rights Registry, where authors and publishers could lay claim to their orphaned works. Kahle and others note that Google did no creative work on the books, just photographed them. In the initial registry agreement, now undergoing revisions after objections by both the government and other businesses, Google was the only listed vendor of the texts, raising questions about what status future entrants might have. “They want to monopolize books, particularly out-of-print books,” Kahle says.
If Google’s registry settlement does acquire the force of law, future digitization efforts could be curbed, or simply ignored as one provider becomes the source of choice–the way iTunes dominates digital music and Google gets eight-tenths of all search queries. “Digital stuff is really prone to monopoly,” says Kahle. “The low cost of distribution means you can dominate something very quickly.”
Google, not surprisingly, says it is only here to help. “I am surprised at the amount of confusion and misinformation there is out there,” says Dan J. Clancy, head of Google’s digitization project. “We strongly hope others will enter the market–but we haven’t seen commercial scanning on a large scale.” Indeed, Microsoft ( MSFT – news – people ) has abandoned its effort.
Kahle graduated from MIT in 1982 with a bachelor’s degree in computer science and engineering and a specialty in artificial intelligence. He helped start a company called Thinking Machines, which built supercomputers that used parallel processing (arrays of calculators working shoulder to shoulder). Among other things, the machines were very good at searching the contents of other computers. In 1988 Kahle started Wais as a research project and in 1991 created Wais Inc., a Menlo Park, Calif. company that scanned and listed the content of computer servers on the Internet for better understanding and retrieval. Customers included Dow Jones, the New York Times and the U.S. government, but the project was bypassed as the freedom and superior design of the Web made content accessible to all.
“Back then, we thought you could find and publish things you found for money,” he says. “You couldn’t, until Jeff Bezos and Steve Jobs found ways for individuals to pay for digital content.” In 1995 he sold Wais to America Online for $15 million.
Kahle and his Wais partner, Bruce Gilliat, started both the Internet Archive and Alexa Internet, a company that created software that logged traffic patterns and recommended sites on the Internet. They sold it to Amazon in 1999 for $250 million in stock. The Archive also hosts the Wayback Machine, which preserves Web pages that might otherwise be altered or destroyed. It has 150 billion of those–everything from yesterday’s FORBES online to the 1996 Yahoo ( YHOO – news – people ) home page–available for free. Owners of the pages can opt out and retroactively remove their pages from the Wayback Machine.
Kahle and his wife have put $45 million into a foundation, which should keep Bookserver going for a long time. So far only one big library, the University of Toronto, has signed up to use the Bookserver lending function, but more are expected to join soon.
“This is like those old movies of airplanes trying to get off the ground in 1910,” Kahle says. “We don’t have a 747 yet–but we will, if we open things up enough.”
The BookServer is a growing open architecture for vending and lending digital books over the Internet. Built on open catalog and open book formats, the BookServer model allows a wide network of publishers, booksellers, libraries, and even authors to make their catalogs of books available directly to readers through their laptops, phones, netbooks, or dedicated reading devices. BookServer facilitates pay transactions, borrowing books from libraries, and downloading free, publicly accessible books.