A Decade of Social Bot Detection

“A Decade of Social Bot Detection”
Communications of the ACM, October 2020, Vol. 63 No. 10, Pages 72-83
Review Articles
By Stefano Cresci

“Devising techniques for spotting suspiciously coordinated and synchronized behaviors is likely to yield better results than analyzing individual accounts.”

On the morning of November 9, 2016, the world woke up to the shocking outcome of the U.S. Presidential election: Donald Trump was the 45th President of the United States of America. An unexpected event that still has tremendous consequences all over the world. Today, we know that a minority of social bots—automated social media accounts mimicking humans—played a central role in spreading divisive messages and disinformation, possibly contributing to Trump’s victory.

In the aftermath of the 2016 U.S. elections, the world started to realize the gravity of widespread deception in social media. Following Trump’s exploit, we witnessed to the emergence of a strident dissonance between the multitude of efforts for detecting and removing bots, and the increasing effects these malicious actors seem to have on our societies.27,29 This paradox opens a burning question: What strategies should we enforce in order to stop this social bot pandemic?

In these times—during the run-up to the 2020 U.S. elections—the question appears as more crucial than ever. Particularly so, also in light of the recent reported tampering of the electoral debate by thousands of AI-powered accounts.

What struck social, political, and economic analysts after 2016—deception and automation—has been a matter of study for computer scientists since at least 2010. In this work, we briefly survey the first decade of research in social bot detection. Via a longitudinal analysis, we discuss the main trends of research in the fight against bots, the major results that were achieved, and the factors that make this never-ending battle so challenging. Capitalizing on lessons learned from our extensive analysis, we suggest possible innovations that could give us the upper hand against deception and manipulation. Studying a decade of endeavors in social bot detection can also inform strategies for detecting and mitigating the effects of other—more recent—forms of online deception, such as strategic information operations and political trolls.

The Social Bot Pandemic

Social bots coexist with humans since the early days of online social networks. Yet, we still lack a precise and well-agreed definition of what a social bot is. This is partly due to the multiple communities studying them and to the multifaceted and dynamic behavior of these entities, resulting in diverse definitions each focusing on different characteristics. Computer scientists and engineers tend to define bots from a technical perspective, focusing on features such as activity levels, complete or partial automation, use of algorithms and AI. The existence of accounts that are simultaneously driven by algorithms and by human intervention led to even more fine-grained definitions and cyborgs were introduced as either bot-assisted humans or human-assisted bots. Instead, social scientists are typically more interested in the social or political implications of the use of bots and define them accordingly.

Social bots are actively used for both beneficial and nefarious purposes. Regarding the detection of benign or malicious social bots, the majority of existing works focused on detecting the latter. The reason is straightforward if we take into account the categorization proposed by Stieglitz et al. Bots were categorized according to their intent and to their capacity of imitating humans, with the majority of existing specimen being either benign bots that do not aim to imitate humans (for example, news and recruitment bots, bots used in emergencies) or malicious ones relentlessly trying to appear as human-operated. The detection of the former category of bots does not represent a challenge, and scholars devoted the majority of efforts to spot the latter, also because of their tampering with our online ecosystems. Indeed, the wide array of actions that social bots perform and the negligible cost for creating and managing them en masse, open up the possibility to deploy armies of bots for information warfare, for artificially inflating the popularity of public characters and for manipulating opinions.

On the onset of the sudden surge of interest around automation and deception, several studies measured the extent of the social bot pandemic. Results are nothing less than worrying. The average presence of bots was estimated to be in the region of 15% of all active Twitter accounts in 2017, and 11% of all Facebook accounts in 201938—a considerable share indeed. Even more worrisome, when strong political or economic interests are at stake, the presence of bots dramatically increases. A 2019 study reported that 71% of Twitter users mentioning trending U.S. stocks, are likely to be bots. Similar results were obtained about the presence of bots in online cryptocurrency discussions and as part of the “infodemics” about the COVID-19 pandemic. Other studies specifically focused on political activity, concluding that bots played a role in strategic information operations orchestrated ahead of numerous worldwide events, as shown in Figure 1. Despite taking part in political discussions about all countries highlighted in figure, bots did not always have a real impact. In fact, scholars still lack a widespread consensus on the impact of social bots, with some studies reporting on their pivotal role for increasing disinformation’s spread, polarization, and hateful speech, and competing results claiming that bots do not play a significant role in these processes. The ubiquity of social bots is also partly fueled by the availability of open source code, for which Bence Kollanyi reported an exponential growth that led in 2016 to more than 4,000 GitHub repositories containing code for deploying Twitter bots. Other investigations demonstrated this trend has not halted yet. In fact, by 2018, scholars found more than 40,000 public bot repositories. The looming picture is one where social bots are among the weapons of choice for deceiving and manipulating crowds. These results are backed by the same platforms where information operations took place—namely, Facebook, Twitter and Reddit—that banned tens of thousands accounts involved in coordinated activities since 2016.

Given the reported role of bots in several of the ailments that affect our online ecosystems, many techniques were proposed for their detection and removal—adding to the great coverage from news outlets—contributing to the formation of a steeply rising publication trend. Today, new studies on the characterization, detection, and impact estimation of bots are published at an impressive rate, as shown in Figure 2. Should this skyrocketing trend continue, by 2021 there will be more than one new paper published per day, which poses a heavy burden on those trying to keep pace with the evolution of this thriving field. Perhaps even more importantly, the rate at which new papers are published implies that a huge worldwide effort is taking place in order to stop the spread of the social bot pandemic. But where is all this effort leading? To answer this question, we first take a step back at the early days of social bot detection.