Refine
Document Type
- Bachelor Thesis (1)
- Master's Thesis (1)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2) (remove)
Keywords
- Deep learning (1)
- IDS (1)
- Intrusion Detection (1)
- Machine learning (1)
- Network-Intrusion-Detection (1)
- Vulnerability identification (1)
- real-time (1)
Institute
Open Access
- Closed Access (2)
The identification of vulnerabilities is an important element of the software development process to ensure the security of software. Vulnerability identification based on the source code is a well studied field. To find vulnerabilities on the basis of a binary executable without the corresponding source code is more challenging. Recent research has shown how such detection can be performed statically and thus runtime efficiently by using deep learning methods for certain types of vulnerabilities.
This thesis aims to examine to what extent this identification can be applied sufficiently for a variety of vulnerabilities. Therefore, a supervised deep learning approach using recurrent neural networks for the application of vulnerability detection based on binary executables is used. For this purpose, a dataset with 50,651 samples of 23 different vulnerabilities in the form of a standardised LLVM Intermediate Representation was prepared. The vectorised features of a Word2Vec model were then used to train different variations of three basic architectures of recurrent neural networks (GRU, LSTM, SRNN). For this purpose, a binary classification was trained for the presence of an arbitrary vulnerability, and a multi-class model was trained for the identification of the exact vulnerability, which achieved an out-of-sample accuracy of 88% and 77%, respectively. Differences in the detection of different vulnerabilities were also observed, with non-vulnerable samples being detected with a particularly high precision of over 98%. Thus, the methodology presented allows an accurate detection of vulnerabilities, as well as a strong limitation of the analysis scope for further analysis steps.
In the field of network security, the detection of intrusions is an important task to prevent and analyse attacks.
In recent years, an increasing number of works have been published on this subject, which perform this detection based on machine learning techniques.
Thereby not only the well-studied detection of intrusions, but also the real-time capability must be considered.
This thesis addresses the real-time functionality of machine learning based network intrusion detection.
For this purpose we introduce the network feature generator library PyNetFlowGen, which is designed to allow real-time processing of network data.
This library generates 83 statistical features based on reassembled data flows.
The introduced performant Cython implementation allows processing individual packets within 4.58 microseconds.
Based on the generated features, machine learning models were examined with regard to their runtime and real-time capabilities.
The selected Decision-Tree-Classifier model created in Python was further optimised by transpiling it into C-Code, what reduced the prediction time of a single sample to 3.96 microseconds on average.
Based on the feature generator and the machine learning model, an basic IDS system was implemented, which allows a data throughput between 63.7 Mbit/s and 2.5 Gbit/s.