“Cyber Reconnaissance Techniques”
Communications of the ACM, March 2021, Vol. 64 No. 3, Pages 86-95
By Wojciech Mazurczyk, Luca Caviglione
“The evolution of and countermeasures for existing reconnaissance techniques. ”
Almost every day, security firms and mass media report news about successful cyber attacks, which are growing in terms of complexity and volume. According to Industry Week, in 2018 spear-phishing and spoofing attempts of business emails increased of 70% and 250%, respectively, and ransomware campaigns targeting enterprises had an impressive 350% growth. In general, economic damages are relevant, as there is the need of detecting and investigating the attack as well as restoring the compromised hardware and software. To give an idea of the impact of the problem, the average cost of a data breach has risen from $4.9 million in 2017 to $7.5 million in 2018. To make things worse, attackers can now use a wide range of tools for compromising hosts, network appliances and Internet of Things (IoT) devices in a simple and effective manner, for example, via a Crime-as-a-Service business model.
Usually, each cyber threat has its own degree of sophistication and not every attack has the same goal, impact, or extension. However, the literature agrees that an attack can be decomposed into some general phases as depicted in Figure 1. As shown, the Tao of Network Security Monitoring subdivides the attacks in to five stages and the Cyber Kill Chain in to seven stages, whereas Unified Kill Chain proposes a more fine-grained partitioning. Despite the reference model, the first step always requires gathering information on the target and it is commonly defined as “reconnaissance.” Its ultimate goals are the identification of weak points of the targeted system and the setup of an effective attack plan.
In general, reconnaissance relies upon a composite set of techniques and processes and has not to be considered limited to information characterizing the target at a technological level, such as, the used hardware or the version of software components. Attackers also aim at collecting details related to the physical location of the victim, phone numbers, names of the people working in the targeted organizations and their email addresses. In fact, any bit of knowledge may be used to develop a software exploit or to reveal weaknesses in the defensive systems.
Unfortunately, the evolution of the Internet, the diffusion of online social networks, as well as the rise of services for scanning smart appliances and IoT nodes, lead to an explosion of sources that can make the reconnaissance phase quicker, easier, and more effective. This could also prevent contact with the victim or limit its duration, thus making it more difficult to detect early and block reconnaissance attempts. Therefore, investigating the evolution of techniques used for cyber reconnaissance is of paramount importance to deploy or engineer effective countermeasures. Even if the literature provides some surveys on some specific aspects of reconnaissance (see, for example, network scanning and techniques exploiting social engineering) the knowledge is highly fragmented and a comprehensive review is missing. In this perspective, this paper provides a “horizontal” review of the existing reconnaissance techniques and countermeasures, while highlighting emerging trends.
In this article, we introduce the classification and the evolution of the most popular reconnaissance methods. Then, we discuss possible countermeasures and present some future directions.
Classification and Evolution
In order to illustrate the most important cyber reconnaissance techniques and portrait their evolution, we introduce the following taxonomy composed of four classes:
- Social Engineering: It groups methods for collecting information to deceive a person or convincing him/her to behave in a desired manner.
- Internet Intelligence: It groups methods taking advantage of information publicly available in the Internet including databases accessible via the Web.
- Network Information Gathering: It groups methods for mapping the network (or computing) infrastructure of the victim.
- Side-Channels: It groups methods exploiting unintended information leaked by the victim.
Each class accounts for a given “degree of interaction” with the victim, with the wide acceptation of how tight the coupling with the source of information should be for the purpose of the reconnaissance. For instance, reading the computer screen requires to be near the victim, thus potentially having a physical interaction, whereas scanning his/her network can be done remotely. In addition, some side-channels exploit a measurement that entails to be physically in a proximity to the target (for example, to measure the intensity of an electromagnetic field or the temperature of a heat source), while retrieving data from a social network does not require interacting with an asset run or owned by the victim itself.
Sidebar: Examples of Reconnaissance Techniques and Sources
- Shoulder surfing: techniques where the attacker tries to determine confidential data by looking over the shoulders of the victim.
- Dumpster diving: the practice of obtaining information from discarded material, such as documents, components of computing devices like hard drives and memory cards.
- Phishing/Vishing/Smishing: the attacker tries to mislead the victim by impersonating a trustworthy entity by using email, VoIP, and Short Message Service.
- Social Networks: the attacker utilizes social networks (for example, Facebook, LinkedIn, and Twitter) for gathering personal data or persuading the victim to reveal sensitive information or accomplish certain actions.
- whois/rwhois: databases providing information about IP address range and Autonomous Systems used by the victim.
- Website: HTML pages can contain a very large and composite set of data. For the case of corporate websites, available information concerns employees, contact details, position within the organization, just to mention some. Comments left in HTML are another valuable source of information.
- Google Hacking (Google Dorking): techniques utilizing advanced operators of Google to reveal potential security vulnerabilities and/or configuration errors of hardware and devices managed by the victim.
- Social Media: a source of reconnaissance data where an attacker can collect personal information about the victim in order to learn, for instance, his/her habits, hobbies, likes and dislikes, with the aim of creating a more complete profile of the targeted person.
- Shodan/Censys/ZoomEye: specific search engines indexing detailed technical data about different types of devices and network appliances.
Network Information Gathering
- (Port) Scanning: methods for probing devices to establish whether on the targeted host there are open ports and exploitable services.
- (OS/application) Fingerprinting: techniques for recognizing the operating system and/or applications utilized on the targeted device. A host can be stimulated with certain network traffic and replies are analyzed to guess the OS and/or installed applications.
- (Network/Device) Enumeration: the systematic process for discovering hosts/servers/devices within the targeted network that are publicly exposed by the victim.
- Traffic Sniffing: an attacker infers information about the victim network by collecting (sniffing) traffic or via monitoring tools.
- Honeypot Detection: a set of techniques allowing the attacker to recognize whether the compromised machine is real or virtual. Typically, such methods rely on the detailed analysis of the behavior of the breached host (execution delays) or network configurations (MAC address, ARP and RARP entries, and so on).
- EM Emissions/Power Consumption: side-channels can be used to infer the signals leaked from screens, printers, or keyboards, to retrieve sensitive information. The most relevant physical quantities observed to set the side-channels are electromagnetic emissions or the power consumption of targeted devices.
- Mapping Virtual Resources: side-channels are used to map a cloud infrastructure in order to establish if services are virtualized/containerized or to perform other types of attacks like co-resident threats. Typically, this class of side-channels operates in a completely remote manner.
About the Authors:
Wojciech Mazurczyk is University Professor at Warsaw University of Technology, Institute of Computer Science, Warsaw, Poland.
Luca Caviglione is a senior research scientist at National Research Council of Italy, Institute for Applied Mathematics and Information Technologies, Genova, Italy.