How AI is helping historians better understand our past

“How AI is helping historians better understand our past”
Print Title: “AI is transforming humanities research”
MIT Technology Review, April 11, 2023
Artificial Intelligence
by Moira Donovan

“The historians of tomorrow are using computer science to analyze how people lived centuries ago.”

It’s an evening in 1531, in the city of Venice. In a printer’s workshop, an apprentice labors over the layout of a page that’s destined for an astronomy textbook—a dense line of type and a woodblock illustration of a cherubic head observing shapes moving through the cosmos, representing a lunar eclipse.

Like all aspects of book production in the 16th century, it’s a time-consuming process, but one that allows knowledge to spread with unprecedented speed.

Five hundred years later, the production of information is a different beast entirely: terabytes of images, video, and text in torrents of digital data that circulate almost instantly and have to be analyzed nearly as quickly, allowing—and requiring—the training of machine-learning models to sort through the flow. This shift in the production of information has implications for the future of everything from art creation to drug development.

But those advances are also making it possible to look differently at data from the past. Historians have started using machine learning—deep neural networks in particular—to examine historical documents, including astronomical tables like those produced in Venice and other early modern cities, smudged by centuries spent in mildewed archives or distorted by the slip of a printer’s hand.

Historians say the application of modern computer science to the distant past helps draw connections across a broader swath of the historical record than would otherwise be possible, correcting distortions that come from analyzing history one document at a time. But it introduces distortions of its own, including the risk that machine learning will slip bias or outright falsifications into the historical record. All this adds up to a question for historians and others who, it’s often argued, understand the present by examining history: With machines set to play a greater role in the future, how much should we cede to them of the past?

Parsing complexity

Big data has come to the humanities through-initiatives to digitize increasing numbers of historical documents, like the Library of Congress’s collection of millions of newspaper pages and the Finnish Archives’ court records dating back to the 19th century. For researchers, this is at once a problem and an opportunity: there is much more information, and often there has been no existing way to sift through it.

That challenge has been met with the development of computational tools that help scholars parse complexity. In 2009, Johannes Preiser-Kapeller, a professor at the Austrian Academy of Sciences, was examining a registry of decisions from the 14th-century Byzantine Church. Realizing that making sense of hundreds of documents would require a systematic digital survey of bishops’ relationships, Preiser-Kapeller built a database of individuals and used network analysis software to reconstruct their connections.

This reconstruction revealed hidden patterns of influence, leading Preiser-Kapeller to argue that the bishops who spoke the most in meetings weren’t the most influential; he’s since applied the technique to other networks, including the 14th-century Byzantian elite, uncovering ways in which its social fabric was sustained through the hidden contributions of women. “We were able to identify, to a certain extent, what was going on outside the official narrative,” he says.

Preiser-Kapeller’s work is but one example of this trend in scholarship. But until recently, machine learning has often been unable to draw conclusions from ever larger collections of text—not least because certain aspects of historical documents (in Preiser-Kapeller’s case, poorly handwritten Greek) made them indecipherable to machines. Now advances in deep learning have begun to address these limitations, using networks that mimic the human brain to pick out patterns in large and complicated data sets.

…

Sidebar: Days of future past: Three key projects underway in the digital humanities

CorDeep
WHO: Max Planck Institute for the History of Science

WHAT: A web-based application for classifying content from historical documents that include numerical and alphanumerical tables. Software can locate, extract, and classify visual elements designated “content illustrations,” “initials,” “decorations,” and “printer’s marks.”

ITHACA
Who: DeepMind

What: A deep neural network trained to simultaneously perform the tasks of textual restoration, geographic attribution, and chronological attribution, previously performed by epigraphers.

Venice Time Machine Project
Who: École Polytechnique Fédérale de Lausanne, Ca’ Foscari, and the State Archives of Venice

What: A digitized collection of the Venetian state archives, which cover 1,000 years of history. Once it’s completed, researchers will use deep learning to reconstruct historical social networks.