Zero-Day Threat Detection Pipeline

Hybrid machine learning approach comprising an autoencoder and supervised classifier


The intention is to create a hybrid model combining supervised classification and unsupervised behaviour modelling to identify signs of a zero-day attack in network traffic data.


The deployment scenario would be: consider a network whose traffic we are monitoring. The simplest example would be using Wireshark to capture packet information in an Ethernet channel. The captured packet data would be processed into usable features (e.g., timestamp, protocol, flags, SenderIP, DestIP, Port, etc.).


The features would be supplied as inputs for a two-stage analysis:

  1. Classification: A supervised classifier would be trained on labelled network traffic data and would learn what kind of packets indicate a threat and which ones are benign. This classifier would receive a new entry (new packet from the network) and classify it according to its modelled behaviour.
  2. Anomaly Detection: An autoencoder would be trained to summarise standard network traffic behaviour, i.e., known behaviour. Any deviations from this learned baseline would be flagged as an anomaly, serving as a filter for zero-day threats.


Some problems in implementing this, immediately come to mind. Firstly, network traffic data is not uniform across all governed networks. Secondly, the supervised classifier would have to be regularly updated to make the zero-day threat detection actually useful. Thirdly, the decision-making time would have to be minimised, as it would accumulate for each recorded network packet.


Refer to the Github Repository for the demo. This page only serves to provide a brief of the work.




Pipeline Demo


The decision-maker (classifier + anomaly detector) was wrapped in a RESTful API and integrated key-authentication, API call logging, and rate limiting. None of this is relevant to the design and development of the actual decision-maker itself; this demo was made to prototype a barebones telemetry dashboard to review the model's actions in real-time.




Future Work


The threat-detection pipeline will be reworked, to ensure it is built upon validated design steps rather than a combination of likely features. The next build will be a deployable solution catered to a specific network operational scenario.

Satvik Agrawal

Computer and Communications Engineering Undergraduate