“Human Detection of Machine-Manipulated Media”
Communications of the ACM, October 2021, Vol. 64 No. 10, Pages 40-47
By Matthew Groh, Ziv Epstein, Nick Obradovich, Manuel Cebrian, Iyad Rahwan
“This new capacity for scalable manipulation raises the question of how prepared people are to detect manipulated media.”
The recent emergence of artificial intelligence (AI)-powered media manipulations has widespread societal implications for journalism and democracy, national security, and art. AI models have the potential to scale misinformation to unprecedented levels by creating various forms of synthetic media. For example, AI systems can synthesize realistic video portraits of an individual with full control of facial expressions, including eye and lip movement; clone a speaker’s voice with a few training samples and generate new natural-sounding audio of something the speaker never said; synthesize visually indicated sound effects; generate high-quality, relevant text based on an initial prompt; produce photorealistic images of a variety of objects from text inputs; and generate photorealistic videos of people expressing emotions from only a single image. The technologies for producing machine-generated, fake media online may outpace the ability to manually detect and respond to such media.
We developed a neural network architecture that combines instance segmentation with image inpainting to automatically remove people and other objects from images. Figure 1 presents four examples of participant-submitted images and their transformations. The AI, which we call a “target object removal architecture,” detects an object, removes it, and replaces its pixels with pixels that approximate what the background should look like without the object. This architecture operationalizes one of the oldest forms of media manipulation, known in Latin as damnatio memoriae, which means erasing someone from official accounts.
The earliest known instances of damnatio memoriae were discovered in ancient Egyptian artifacts, and similar patterns of removal have appeared since. Historically, visual and audio manipulations required both skilled experts and a significant investment of time and resources. Our architecture can produce photo- realistic manipulations nearly instantaneously, which magnifies the potential scale of misinformation. This new capacity for scalable manipulation raises the question of how prepared people are to detect manipulated media.
To publicly expose the realism of AI-media manipulations, we hosted a website called Deep Angel, where anyone in the world could examine our neural-network architecture and its resulting manipulations. Between August 2018 and May 2019, 110,000 people visited the website. We integrated a randomized experiment based on a two-alternative, forced-choice design within the Deep Angel website to examine how repeated exposure to machine-manipulated images affects an individual’s ability to accurately identify manipulated imagery.
About the Authors:
Matthew Groh is a Ph.D. candidate in the Massachusetts Institute of Technology (MIT) Media Lab, Cambridge, MA, USA.
Ziv Epstein is a Ph.D. candidate in the Massachusetts Institute of Technology (MIT) Media Lab, Cambridge, MA, USA.
Nick Obradovich is a senior research scientist and principal investigator in the Center for Humans & Machines at Max Planck Institute for Human Development, Berlin, Germany.
Manuel Cebrian is the Max Planck Research Group Leader of the Digital Mobilization Research Group in the Center for Humans & Machines at Max Planck Institute for Human Development, Berlin, Germany.
Iyad Rahwan is director in the Center for Humans & Machines at Max Planck Institute for Human Development, Berlin, Germany.