“Achieving Digital Permanence”
Communications of the ACM, May 2019, Vol. 62 No. 5, Pages 36-42
By Raymond Blum, Betsy Beyer
“Digital permanence not only considers data integrity, but also targets guarantees of relevance and accessibility.”
Digital permanence has become a prevalent issue in society. This article focuses on the forces behind it and some of the techniques to achieve a desired state in which “what you read is what was written.” While techniques that can be imposed as layers above basic data stores—blockchains, for example—are valid approaches to achieving a system’s information assurance guarantees, this article will not discuss them.
First, let’s define digital permanence and the more basic concept of data integrity.
Data integrity is the maintenance of the accuracy and consistency of stored information. Accuracy means the data is stored as the set of values that were intended. Consistency means these stored values remain the same over time—they do not unintentionally waver or morph as time passes.
Digital permanence refers to the techniques used to anticipate and then meet the expected lifetime of data stored in digital media. Digital permanence not only considers data integrity, but also targets guarantees of relevance and accessibility: the ability to recall stored data and to recall it with predicted latency and at a rate acceptable to the applications that require that information.
To illustrate the aspects of relevance and accessibility, consider two counterexamples: journals that were safely stored redundantly on Zip drives or punch cards may as well not exist if the hardware required to read the media into a current computing system isn’t available. Nor is it very useful to have receipts and ledgers stored on a tape medium that will take eight days to read in when you need the information for an audit on Thursday.
The Multiple Facets of Digital Permanence…
Information Permanence in the Digital Age…
Categorizing Failure Modes…
Mitigating Risks to Digital Permanence…
Making It Last and Keeping It True
Every era has introduced new societal challenges when developing and dealing with technological advances. In the Industrial Age, machining methods evolved to produce more, better, and previously undreamt of machines and tools. Today’s Information Age is creating new uses for and new ways to steward the data that the world depends on. The world is moving away from familiar, physical artifacts to new means of representation that are closer to information in its essence.
Since we can no longer rely on the nature of a medium to bestow permanence, we must devise mechanisms that are as fluid and agile as the media to which we are entrusting our information and ever increasing aspects of our lives. We need processes to ensure both the integrity and accessibility of knowledge in order to guarantee that history will be known and true.
About the Authors:
Raymond Blum leads an engineering team in Google’s Developer Infrastructure that is charged with keeping thousands of Google engineers productive. He was previously a site reliability engineer at Google.
Betsy Beyer is a technical writer for Google Site Reliability Engineering in New York, NY, and the editor of Site Reliability Engineering: How Google Runs Production Systems. She has written documentation for Google’s datacenter and hardware operations teams.