Refine
Document Type
- Conference Proceeding (2)
- Bachelor Thesis (1)
- Master's Thesis (1)
Conference Type
- Konferenzartikel (2)
Language
- English (4)
Keywords
- Machine learning (4) (remove)
Institute
Open Access
- Closed (2)
- Closed Access (1)
- Diamond (1)
- Open Access (1)
The importance of machine learning (ML) has been increasing dramatically for years. From assistance systems to production optimisation to healthcare support, almost every area of daily life and industry is coming into contact with machine learning. Besides all the benefits ML brings, the lack of transparency and difficulty in creating traceability pose major risks. While solutions exist to make the training of machine learning models more transparent, traceability is still a major challenge. Ensuring the identity of a model is another challenge, as unnoticed modification of a model is also a danger when using ML. This paper proposes to create an ML Birth Certificate and ML Family Tree secured by blockchain technology. Important information about training and changes to the model through retraining can be stored in a blockchain and accessed by any user to create more security and traceability about an ML model.
In the field of network security, the detection of intrusions is an important task to prevent and analyse attacks.
In recent years, an increasing number of works have been published on this subject, which perform this detection based on machine learning techniques.
Thereby not only the well-studied detection of intrusions, but also the real-time capability must be considered.
This thesis addresses the real-time functionality of machine learning based network intrusion detection.
For this purpose we introduce the network feature generator library PyNetFlowGen, which is designed to allow real-time processing of network data.
This library generates 83 statistical features based on reassembled data flows.
The introduced performant Cython implementation allows processing individual packets within 4.58 microseconds.
Based on the generated features, machine learning models were examined with regard to their runtime and real-time capabilities.
The selected Decision-Tree-Classifier model created in Python was further optimised by transpiling it into C-Code, what reduced the prediction time of a single sample to 3.96 microseconds on average.
Based on the feature generator and the machine learning model, an basic IDS system was implemented, which allows a data throughput between 63.7 Mbit/s and 2.5 Gbit/s.
The importance of machine learning has been increasing dramatically for years. From assistance systems to production optimisation to support the health sector, almost every area of daily life and industry comes into contact with machine learning. Besides all the benefits that ML brings, the lack of transparency and the difficulty in creating traceability pose major risks. While there are solutions that make the training of machine learning models more transparent, traceability is still a major challenge. Ensuring the identity of a model is another challenge. Unnoticed modification of a model is also a danger when using ML. One solution is to create an ML birth certificate and an ML family tree secured by blockchain technology. Important information about training and changes to the model through retraining can be stored in a blockchain and accessed by any user to create more security and traceability about an ML model.
The research employed HPTLC Pro System and other HPTLC instruments from CAMAG® to conduct various laboratory tests, aiming to compile a database for subsequent analyses. Utilizing MATLAB, distinct codes were developed to reveal patterns within analyzed biomasses and pyrolysis oils (sewage sludge, fermentation residue, paper sludge, and wood). Through meticulous visual and numerical analysis, shared characteristics among different biomasses and their respective pyrolysis oils were revealed, showcasing close similarities within each category. Notably, minimal disparity was observed in fermentation residue and wood biomasses with a similarity coefficient of 0.22. Similarly, for pyrolysis oils, the minimal disparity was found in fermentation residues 1 and 3, with a disparity coefficient of 1.41. Despite higher disparity coefficients in certain results, specific biomasses and pyrolysis oils, such as fermentation residue and sewage sludge, exhibited close similarities, with disparity coefficients of 0.18 and 0.55, respectively. The database, derived from triplicate experimentation, now serves as a valuable resource for rapid analysis of newly acquired raw materials. Additionally, the utility of HPTLC PRO as an investigation tool, enabling simultaneous analysis of up to five samples, was emphasized, although areas for improvement in derivatization methods were identified.