The Anomaly Detection Framework of SPHINX Toolkit: PART II – Anomaly Detection
Continue from Part I
The Automated Cyber Security Risk Assessment is one of the building blocks of SPHINX Architecture and deals with advanced and automated tools to assess the level of cyber security of a given healthcare IT environment. Integrated in this block, SPHINX features its Cross-Layer Anomaly Detection Framework, a system responsible with the detection of malicious activities by monitoring and analysing network traffic.
The system consists of two components, the Data Traffic Monitoring (DTM) and Anomaly Detection (AD) components, whose main purpose is to detect, report and alert for cyber threats.
Anomaly Detection is a SPHINX component that uses a different approach than the DTM component to threat discovery. It analyses network activity and classifies it as either normal or anomalous. Instead of using signatures as a basis for classification, AD builds profiles for normal behaviours and uses data mining and machine learning algorithms to identify outliers that are reported as alerts.
AD does not use the raw network data. AD uses as input the logs generated by Data Traffic Monitoring component. These logs describe the network activity in high-level terms. For example, they can contain:
- TCP connections
- HTTP sessions with details like URIS, headers, MIME types, server responses
- DNS requests with replies
- SMTP sessions
DTM discovers only threats that are in its signature database. That means it can detect only known threats. The AD component creates a baseline for normal network/device/user behaviour. This approach of AD component allows it to detect new and unknown threats by monitoring the outlier activity that departs from baseline.
The main functionalities of this component are:
- detection of ecosystem disturbances;
- implement a set of rules based on the characteristics of previous system events, user activities and incidents;
- provide an alert engine to raise notifications.
The design of the AD component is based on the following considerations:
- SPHINX ecosystem is a complex system, with components built using different programming language and technologies. As part of this ecosystem, AD must be able to interoperate with the other SPHINX components. This is achieved in two ways:
1. by using a messaging system. The information is published in topics. Other SPHINX components subscribe to the topic of interest in order to get access to data.
2. by exposing RESTful web services.
- AD should be scalable in order to be usable with networks with both low traffic and high traffic volume. Because of this AD is based on big data tools and algorithms optimized for distributed execution.
- AD should be able to discover unknown threats, thus complementing the DTM component. In order to be able to discover unknown threats, AD uses statistics and machine learning in order to analyse the network activity, create profiles corresponding to normal activities and detect anomalous activities reported to these profiles.
- AD should be configurable and extensible. Depending on the network environment where the component is deployed it is possible that not all the analyses implemented in AD are necessary. AD should permit to enable or disable supported analyses. Also, AD should allow to easily adding new analyses.
The following figure is a high-level representation of AD logical architecture and collaboration with other SPHINX Components:
The internal logical architecture of the AD component is represented in the following figure:
AD ingests the data about network activity published by the DTM component. The data is stored internally in a storage service capable of handling large data volumes. The anomaly detection engine applies statistical and ML algorithms in order to create a baseline of normal activity and detect anomalies. The backend contains application and persistence logic and exposes a REST API used both by the web subcomponent and other SPHINX components.
AD component complements DTM in the role of detecting threats. Therefore, it has similar interactions with the other SPHINX components. The supported interactions with the other components are depicted in the figure below:
AD sends information on anomalies detected in system events and user behaviour that comprise a threat to the IT infrastructure to the following components: Forensic Data Collection Engine (FDCE), Security Information and Event Management (SIEM), Real-time Cyber Risk Assessment (RCRA), Knowledge Base (KB) and the Interactive Dashboards (ID).
The Anomaly Detection and Data Traffic Monitoring components are developed to be complimentary as DTM detects known threats based on signatures, while AD detects unknown threats by creating profiles of normal behaviour and searching for network traffic events that are considered anomalous because they do not fit the profiles.
More information about AD component and the Cross-Layer Anomaly Detection Framework can be found at Deliverable 4.1 that is publicly available here.