“After 25 years, Brewster Kahle and the Internet Archive are still working to democratize knowledge”
NiemanLab, March 24, 2022
By Joshua Benton
“Corporations continue to control access to materials that are in the library, which is controlling preservation, and it’s killing us.”
Brewster Kahle has been at this a long time.
Consider the photo above evidence. (And yes, children, computer monitors were once the size of a mini-fridge.) It was taken by internet legend (and open records hero) Carl Malamud in December 1991, when he was reporting out what would become Exploring the Internet: A Technical Travelogue, which aimed to put some faces to the burbling sense that something exciting was happening with connected computers.
These were still early days online — only four months after Tim Berners-Lee mentioned his “WorldWideWeb” project for the first time in the newsgroup alt.hypertext. The first version of Netscape was still three years away. And there was Kahle, just 31, but already with a stuffed resume: researcher at MIT’s AI Lab, lead engineer at supercomputer maker Thinking Machines, lead developer of WAIS (Wide Area Information Server), something like an alpha version of what the web would become.
“After delving into the arcana of message-passing protocols for massively parallel processors,” Malamud wrote, “Brewster turned his attention to the much more difficult problem of finding and using information on networks.”
Brewster ushered me into his office, where he sat down on a beat-up old easy chair and balanced a keyboard on his lap. The screen and rollerball mouse were conveniently nearby, making this a highly comfortable work or play station. There was no need to start up his WAIS client since it was already up and running. Deployed for only a few months on the Internet, WAIS was a quickly becoming a part of people’s routines, and had certainly been integrated into Brewster’s daily work.
Brewster typed in a query: “Is there any information about Biology?” The query was sent, in its entirety, to the server of servers that Brewster maintained, quake.think.com. Servers of servers were no different than document servers, they simply kept a list of other servers and a description of the information they maintained.
We got back a list of servers throughout the world that had information on biology, such as a database of 981 metabolic intermediate compounds maintained in the Netherlands. At this point, we refined our query and sent it out to many servers simply by pointing to them on the screen. Servers returned lists of document descriptions; pointing to those documents retrieved the full text.
Brewster’s goal was to enable anybody with a computer, even a lowly PC, to become a publisher. The first PC-based WAIS server had recently gone online, running in somebody’s basement, and Brewster was quite excited by the prospect.
Brewster’s interest in publishing was personal as well as professional. His fianceé ran a printing museum and in the basement was an old printing press.
That’s how someone who started out in AI and microchip design ended up being the internet’s librarian.
In 1996, Kahle founded the Internet Archive, which stands alongside Wikipedia as one of the great not-for-profit knowledge-enhancing creations of modern digital technology. You may know it best for the Wayback Machine, its now quarter-century-old tool for deriving some sort of permanent record from the inherently transient medium of the web. (It’s collected 668 billion web pages so far.) But its ambitions extend far beyond that, creating a free-to-all library of 38 million books and documents, 14 million audio recordings, 7 million videos, and more. (Malamud’s book is, of course, among them.)
That work has not been without controversy, but it’s an enormous public service — not least to journalists, who rely on it for reporting every day. (Not to mention the Wayback Machine is often the only place to find the first two decades of web-based journalism, most of which has been wiped away from its original URLs.)
A little while back, the Internet Archive celebrated its 25th birthday, and I used that as an excuse to chat with Kahle about how his vision for it had changed along with the internet it tries to preserve in amber — and about why there is still so much human knowledge locked away on microfilm. Here are some bits of our conversation, lightly edited to make me sound more coherent on Zoom calls.
Joshua Benton: I’m 46, so I arrived at college right in the earliest days of the web. I have an enormous fondness for the optimism and the idealism people had about technology back then. The Internet Archive feels like a project from that era — free, open to all, assembled from millions of different parts and sources. How close is the archive today to what you were imagining 25 years ago? Is it recognizable compared to what you were planning, or hoping for?
Brewster Kahle: I think so, roughly, yes. I think the way other organizations participate with the Archive is different than what I would have imagined.
I would have thought that libraries would have just digitized all their books, and that they would have followed the same course as with the digitization of the card catalog. People went and copied their physical card catalogs into software that was running on their machines.
But what really happened was, you know, not as much. We had the Million Books Project. We were digitizing away. But then Google Books came along and said, “We’ll take it all.” And that was a complete surprise. And then some people said, “We’ll get the books scanned, but we’ll only share it among ourselves.” That was HathiTrust. That I found not that encouraging, in terms of public-spiritedness and the opportunity of the internet to make it available to anybody, anywhere. You know, let’s break open the walls of academia!
There was this guy, Binkley — I really loved Binkley. I really wanted to learn more about him. In the 1930s, he was a thinker and a promoter of microfilm — but microfilm as a mechanism of distributing knowledge, specifically to rural populations, to break the city elite. He thought that this was a way of democratizing knowledge.
It turned out that instead, you know, they microfilmed things and mostly kept it just for themselves.
Benton: You know, the Nieman Foundation at Harvard, where I work, was initially, back in the 1930s, supposed to be centered around this giant collection of journalism on microfilm. The head of Nieman is still titled the “curator” all these years later, because the original job was supposed to be to curate this collection. Microfilm was really having a moment in the ’30s, I guess.
Kahle: It was a thing. I was really clued into this by — I don’t remember her name, she’s retired now from the MIT library. But when I gave this talk about the Internet Archive — you know, my rousing “universal access to all knowledge” blah blah blah — at the Boston Public Library, she came up to me afterward and said, in that quiet librarian way: “Brewster, I’ve heard this speech before. It was all about microfilm.”
Benton: I really don’t understand why there’s anything left in the world that’s still only available on microfilm. Digitizing all the world’s books — okay, that’s a giant challenge. That’s a huge, unknowable data set. But why hasn’t every academic library digitized all its out-of-copyright microfilmed manuscripts, which I would think is much, much easier?
Kahle: It’s all about licensing, the licensing plague. It’s the shift from libraries owning things to corporations licensing and controlling access to materials that are in libraries. Corporations continue to control access to materials that are in the library, which is controlling preservation, and it’s killing us.
About the Author:
Joshua Benton founded Nieman Lab in 2008 and served as its director until 2020; he is now the Lab’s senior writer. Before spending a year at Harvard as a 2008 Nieman Fellow, he spent a decade in newspapers, mostly at The Dallas Morning News. His reports on cheating on standardized tests in the Texas public schools led to the permanent shutdown of a school district and won the Philip Meyer Journalism Award from Investigative Reporters and Editors. He has reported from a dozen foreign countries, been a Pew Fellow in International Journalism, and three times been a finalist for the Livingston Award for International Reporting. Before Dallas, he was a reporter and occasional rock critic for The Toledo Blade. He wrote his first HTML in January 1994.