sphinx-project.eu / Blog  / The AI Honeypot System entailed into SPHINX Toolkit

The AI Honeypot System entailed into SPHINX Toolkit

Honeypots are computer security components used to deflect cyber-attacks and deter certain types of attacks. They try to perfectly emulate information system resources with data that seem legitimate in order to lure the attackers in them and protect the real systems of a network from them. Honeypots are similar to a bait in a network. They are intentionally vulnerable and have no legitimate production value beyond the honeypot goals.

There are a lot of honeypots with different functions. That is why there are a lot of categorizations.

First of all, honeypots are categorised according to their design standards to Pure Honeypots, High Interaction Honeypots and Low Interaction Honeypots. Pure Honeypots are real systems with real data that are used to lure attackers. High interaction honeypots are emulation systems that have a lot of fake services, just to waste the attacker’s time. Low interaction honeypots are emulation of selected services that an attacker may target. Since they are selective services, this kind of honeypots require a miniscule amount of resources and they can be deployed/managed easily in any system.

In addition, honeypots are classified in two categories based on their deployment: Production Honeypots and Research Honeypots.  Production Honeypots are usually deployed in production environments, in between real services/hardware. Research Honeypots are used to learn from attackers and gather information. They are not really used for protection most of the time, but as a scientific sandbox for possible vulnerability detection.

Design and Principles

Following the above general concepts regarding Honeypots, the SPHINX AI Honeypot module is developed as part of the cybersecurity toolkit proposed by the project. The AI Honeypot of SPHINX is a component that consists of six sub-components:

HP Core: It is the main component of Honeypot. It is responsible for the simulation of the following services:

  • SSH
  • FTP
  • SMTP
  • HTTP

HP Message Queue: It is the component responsible for retrieving attack data recorded from HP Core. After retrieving the data, it sends them to HP Data Consumer.

HP Data Consumer: HP Data Consumer receives attack data from HP Message Queue. After that, depending on the service of the attack, it stores them in the corresponding tables in the HP Storage DB.

HP Storage DB: This component is responsible for storing attack logs data from each HP Core service.

HP API: It provides a REST interface to components that want to consume attack data logs for the HP Core services. It provides endpoints for the following services:

  • SSH
  • FTP
  • SMTP
  • HTTP

It also includes a dedicated endpoint for retrieval of MLID data. Furthermore, upon demand, it can request MLID data from HP Data Processor.

HP Data Processor: This component is responsible for gathering attack logs data from the HP Storage DB. By processing the attack logs it can produce the data needed by the MLID component of SPHINX.

The following UML component diagram depicts the organization and wiring of the physical components in the SPHINX Honeypot (HP) system; and illustrates the basic interactions between the Honeypot components. The HP Core sends data to the HP Message Queue via an interface. The HP storage DB receives data from the HP Consumer. It also sends data to both HP data processor and HP API. Finally, the HP API component sends data to the HP data processor.

Towards facilitating the deployment and maintenance of the SPHINX Honeypots (i.e. their soft components) the docker framework is exploited; this also enables the SPHINX AI Honeypots to work as farms and to be dynamically (re)configured by utilising, for example, the SPHINX’s situation awareness system.

All Honeypot components are implemented in Golang language. Components that provide REST API endpoints use the Beego web framework. Furthermore, the dockerclient Golang package was used for handling the dockerised HP components and the go-sqlite3 Golang package for implementing the data model in an SQLite3 database. Intercommunication between the HP components is performed using RabbitMQ.

Honeypot Artificial Intelligence

One of the most challenging and intensive tasks in the SPHINX HP development process is related with their ability to support AI algorithms designed to detect anomalies (e.g. attempt to install malware in the authority’s IT infrastructure). For this purpose, particular attention and time was devoted to implement the HP Data Processor presented in the previous section. This module performs sophisticated algorithms in order to properly process the attack information gathered by HPs, and generate data in a format that AI algorithms (such as the SPHINX MLID component) can understand, manage and use to detect attacks, so that prompt and effective action can be taken, as appropriate.

The HP-generated dataset, is a variant of the most widespread IDS benchmark dataset at present, named NSL-KDD dataset [11], extended to include a few additional features necessary according to the needs expressed in SPHINX so far. Currently, the dataset enlists features per record, but new ones may be added if needed. The majority of 44 features are related to the traffic input itself, while the last two are labels indicating the type of the attack and the severity of the traffic input itself and they are used only during the training phase of the AI-based systems.

More information about AI Honeypot system can be found at Deliverable 4.4 that is publicly available here.