MAGAZINE

March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006

SUBSCRIBE!

The World's Digital Curator

THE WORLD'S DIGITAL CURATOR

Brewster Kahle wants to build a universal library for all of humankind, banking on the growing trend of technological non-profits. His, called the Internet Archive, has been using algorithms to crawl the web for the past decade using a ranking system to take snapshots of popular websites. This search (spyware-like) technology is impressive given its age. In 1983, while working at the startup, Brewster created a system called Wide Area Information Servers (WAIS). It was the world’s first incarnation of tag-and-rank, web-browser search. 

 

Today, the Internet Archive is strengthened by the new Open Content Alliance with partners like Adobe, HP, Microsoft and Yahoo! Brewster admits these companies came on board primarily to stun the ambitious plans of Google, but their leverage is vital to fill the universal library with petabytes of archived movies, websites, and any type of recordable information. Currently, the universal library is without books, as new worldwide scanning efforts like those from the Internet Archive are trying to break through decades of corporate-minded copyright laws.

 

Jon Robinson: How did you find your way into the computing world in the 1970s?

Kahle: Usually people ask how did I get into libraries and it was always from computing. With computers, if you do things right, you can move mountains and in this case mountains are bits. The idea was to use computers to build the digital library that we have been promised for so long.

 

How did you connect computing with libraries?

If you are a technologist, you have to use your tools for something, because it is a tool. So there were two great projects. One was cryptography to protect people’s privacy and the other was to build the library. The encryption one, I couldn’t figure out how to do that in a way to help the common man – back in the late ‘70s. It was too expensive. It would help basically government institutions or big corporations or illegal worlds and none of those really needed my help. 

 

How did Thinking Machines start?

It was a spin-off of MIT. We took the project that we were doing and built a company around it. We tried to build a computer. I helped design the chips and boards and operating system for it and then tried to use it for searching the library. And then the next step was to use this internet as a distribution system to get publishing going and that is what WAIS was. It was the first internet publishing system. 

 

With a technological breakthrough like WAIS, why did Thinking Machines go bankrupt?

It was sold for parts. It was an astonishing group to work with. The biggest problem was that we built a parallel computer that not very many people knew how to use. And now we are… gosh, what, 20 years later, and the Internet Archive, Google, Hotmail, have all built parallel computers. 

 

Is WAIS the parent of web-based search?

Oh yeah, absolutely. After we got the hardware working, we put a search engine on it and that was used by Dow Jones to search through 400 newspapers and articles. It was part of the Dow Jones’ DowQuest and it was a pretext search that really helped find patterns and rank things. We found that we really needed the internet publishing system to go, so that people would feed it lots of materials. That was what WAIS was about.  

 

Why did AOL buy WAIS in 1995 and how important was that transaction in shaping today’s world of communications?

AOL had a royalty model at the time, which I loved. I like the royalty model more than the advertising model. So about 15 percent of their gross revenue would go to the information service that kept the person online. And they bought up a bunch of internet companies at that time to build AOL 2.0. But then they decided to not make internet 2.0 because their existing business was going just fine and they all started to adopt this cable model.  

 

Will we ever get back to a royalty system?

I sure hope so. Otherwise we will end up with fairly predictable results. Advertising systems build things like the radio-, television-, magazine-type publishing systems and they tend to conglomorate and so you end up with very few of them at the end of the day. 

 

The sale of WAIS netted $15 million. How did you stick to the ideals of the non-for-profit Archive instead of building tech startups worth millions? 

Well, I have never had more than one idea, so I just keep at it, which is to build a library.  Here in Silicon Valley, many people think that corporations are the answer to all problems and I think that the library really belongs in a non-profit. Corporations are very good at exploiting ideas or assets. That is what they do. But libraries are fundamentally different and we want it to be open. We want it to be under the rule of law not under the rule of a corporate structure.

 

Google is trying to build a corporate library. What is your position on their copyright concerns?

Copyright is just an incarnation of a set of rules of how businesses work and it has changed over time. Ben Franklin’s copyright was 14 years, renewable once. And derivatives were not copyrighted. That was how this country was founded, but it has gone bizarrely wrong in the United States. I think in large part because of the influence that corporations have had in the making of law. It has caused a fantastic explosion of government regulation. What copyright has seen in the last 30 years is a tragic mistake. 

 

You want to craft laws to support lots of creativity, lots of innovation, lots of economic expansion and our current copyright laws are disastrous in these regards.

 

When did copyright start going wrong?

The first incredible mistake was in 1976. When I was growing up, you had to put a little “c” in a circle to get copyright protection. In fact, you also had to send a copy to the Library of Congress, otherwise you did not get protection. This seems to make sense. In ’76 they made it so that everything was copyrighted. My second grader’s scriblings are copyrighted for 170 years. This is nuts. In 1998, the Untied States passed this wonderful piece of legislation called the Digital Millennium Copyright Act, which, some people say, has the effect of redefining reading as copying. So that the act of looking at something in the digital world is suddenly a copy. It is Orwellian. I have had lawyers look at me in the face and say, “Reading is copying.”

 

What is the best fix for copyright?

I think Ben Franklin was smart. Maybe these corporate lobbyists are smarter than Ben Franklin and Thomas Jefferson, but I have my doubts. He was a printer and he was out to make sure that people could publish and print. If we were to go back to founders’ copyright, I think, we would have more business and innovation.

 

What is the Open Library Alliance relative to the Internet Archive?

The Open Content Alliance is a group of institutions that are working together to build  joint collections, out in the open. It is a project of the Internet Archive. The Internet Archive facilitates the alliance. So Microsoft and Yahoo!, HP, Adobe, plus hp adobe, about 30 libraries [six Canadian] at this point are all working to build joint collections.  Were companies like Yahoo! and Microsoft eager to join to thwart Google? Absolutely. The timing and the discussions have been galvanized by Google’s bold stance to digitize – and keep proprietary – several great libraries of the world.

 

What do you mean: keep proprietary?

Most of this is shrouded in secrecy and lawsuits, but there is a contract that is issued out of the University of Michigan and put up on the Web, so you can see what the restrictions are. As I understand it, it is on-campus use for Michigan but you cannot download the materials for off-campus use. 

 

Can you possibly scan all of the world’s books into digital format?

We certainly can if we work together. Take the library system in the United States, [which is funded with] $12 billion a year. About one-third of that money goes to buying books, so $3 or $4 billion goes to publishers. If we were to go and take that $12 billion and spend it differently, some of that money would still go to publishers and there would be new electronic services. Even if you wanted to go out and scan a million books, it costs about $30 million so it is not that big of a number [much cheaper in developing worlds].


How does the Internet Archive deal with cross-linking and tags, assuming it is built largely on static PDF pages? Do you need to
change the infrastructure?

It is evolving rapidly in terms of how to make books useful on the internet. I would say that we do not have very many examples yet. Amazon has shown, which is one of the largest book scanning organizations, it helps promote the sale of books. The idea is to let 100 flowers bloom.

 

Why is it important to digitally archive books?

It is really how people for the last 6,000 years, I guess since Sumerian tablets, passed knowledge on from one generation to the next. In this digital world, we are finding more and more people just use what is on the net. If it is not on the net, it is as if it does not exist. And a lot of the treasures that humans have to offer their next generations are not on the net yet. As libraries, we are really duty bound to bring the best that we have to offer within reach of our children.

 

How can you keep a sufficient collection pace with the explosion of RSS feeds and XML?

As more and more material [is published] on the net it is also less expensive to store it. In 1996, we stored materials on tape kept off-line and now we are able to keep them online and spinning and the costs just keeps dropping. You also need a rising budget. We found, that by being frugal, we can operate the Internet Archive at between $5 and $10 million per year.  

 

The real threat to libraries over time is that they are burned. Sometimes they are actually burned with matches and sometimes they are made irrelevant based on changes in law or policy or in how people live. I will do everything that I can to show why these laws make no sense to the long-term intellectual history of our species.

 

How will you do this?

Over the next 10 or 20 years, I hope there will be a handful of libraries centred in the great cultures of the world that would actively collect the materials from their areas. [They would] provide access to it and also make copies in the other archives of the world, therefore, providing long-term preservation. Then you can contribute and add your new ideas back into the library. That, I believe, is the opportunity of our generation and we have the political will by being in an open society where universal education is cherished. I want to spread universal access to all human knowledge. 

 

 

PrintAction March 2008
The Jet Age
Moving at 3-billion drops per second