If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
IDS-ML is a code repository for intrusion detection system development.
•
It introduces the general process of intrusion detection system development.
•
It provides the code implementations of three novel intrusion detection systems.
•
It uses supervised & unsupervised methods for known and zero-day attack detection.
•
It designs advanced methods, such as ensemble learning and model optimization.
Abstract
Due to the expansion and development of modern networks, the volume and destructiveness of cyber attacks are continuously increasing. Intrusion Detection Systems (IDSs) are essential techniques for maintaining and enhancing network security. IDS-ML is an open-source code repository written in Python for developing IDSs from public network traffic datasets using traditional and advanced Machine Learning (ML) algorithms. With optimized ML models, the IDSs developed in the repository can identify various types of cyber-attacks to protect modern networks. This code repository can be easily implemented and reproduced on any intrusion detection datasets to solve problems in the cybersecurity field.
With the rapid expansion of the Internet and communication technologies, as well as the vast number of applications accessible on the network, network security has become a serious issue that must be addressed. Various cybersecurity mechanisms and protection systems have been introduced to protect modern networks, such as firewalls, authentication techniques, cryptography methods, and Intrusion Detection Systems (IDSs) [
]. When suspicious behavior is detected, an IDS will generate an alarm and reports it to the network administrator. Additionally, corresponding countermeasures will then be taken to defend against the ongoing attack and prevent future attacks [
]. Although signature-based IDSs usually achieve high performance on known attack detection tasks, they are unable to detect new or zero-day attacks since their patterns are unknown. On the other hand, anomaly-based IDSs are designed to detect zero-day attacks by distinguishing unknown attacks from pre-defined normal activities [
]. However, their performance on known attack detection is often lower than the performance of signature-based IDSs. Hybrid IDSs are designed to detect both known and unknown attacks by integrating signature-based IDSs and anomaly-based IDSs.
Machine Learning (ML) techniques have recently become promising solutions for developing IDSs. ML is a collection of techniques that employ mathematical formulae to automatically discover, examine, and extract patterns from data [
]. Extracting and acquiring meaningful information helps ML models make informed judgments and predictions. ML algorithms can be classified as supervised and unsupervised learning algorithms [
]. Supervised learning algorithms are a class of ML algorithms that map input variables to a target variable using labeled data for training, such as K-Nearest Neighbors (KNN) [
], etc. For IDS development, supervised learning algorithms are often used to develop signature-based IDSs by training on labeled network datasets, while unsupervised learning algorithms can be used in anomaly-based IDSs to distinguish outliers from normal data.
Effectively identifying cyberattacks is a critical challenge for network operators and managers, particularly in the rapidly evolving modern networks. To improve intrusion detection accuracy and defend against more attacks, many advanced ML techniques can be used to develop IDSs, including ensemble learning, Transfer Learning (TL), and Hyper-Parameter Optimization (HPO). Ensemble learning techniques are designed to improve model learning performance by integrating the output of multiple single ML algorithms as base models, including voting, bagging, stacking, etc. [
M.M. Leonardo, T.J. Carvalho, E. Rezende, R. Zucchi, F.A. Faria, Deep Feature-Based Classifiers for Fruit Fly Identification (Diptera: Tephritidae), in: 2018 31st SIBGRAPI Conf. Graph. Patterns Images, 2018, pp. 41–47, http://dx.doi.org/10.1109/SIBGRAPI.2018.00012.
]. In the IDS-ML code repository, three novel IDS frameworks are provided using advanced ML techniques.
2. The IDS-ML code functionalities and key algorithms
IDS-ML is a code repository that allows researchers to design IDSs to protect modern networks using various ML algorithms. IDS-ML provides solutions to the following research questions:
•
What is the general process of intrusion detection system development?
•
How can we use ML algorithms to design different types of IDSs (i.e., signature-based IDSs, anomaly-based IDSs, and hybrid IDSs)?
•
How can we improve intrusion detection performance with advanced techniques (i.e., ensemble learning, TL, and HPO)?
A high-level overview of IDS-ML is illustrated in Fig. 1. The IDS-ML code repository provides the code implementations for the development of three innovative IDSs: the tree-based IDS [
L. Yang, A. Moubayed, I. Hamieh, A. Shami, Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles, in: 2019 IEEE Glob. Commun. Conf., 2019, pp. 1–6, http://dx.doi.org/10.1109/GLOBECOM38437.2019.9013892.
L. Yang, A. Shami, LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles, in: 2022 IEEE Glob. Commun. Conf., 2022, pp. 1–6.
L. Yang, A. Moubayed, I. Hamieh, A. Shami, Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles, in: 2019 IEEE Glob. Commun. Conf., 2019, pp. 1–6, http://dx.doi.org/10.1109/GLOBECOM38437.2019.9013892.
] to detect various types of known cyber-attacks. The proposed IDS trains four common ML models, Decision Tree (DT), Random Forest (RF), Extra Trees (ET), and Extreme Gradient Boosting (XGBoost), as base models, and then uses stacking, an ensemble learning method, to construct a robust ensemble model by integrating the four base models. Using the stacking ensemble for final decision-making can further improve intrusion detection accuracy.
2.
LCCDE_IDS_GlobeCom22.ipynb: This code is the implementation of an innovative IDS framework named LCCDE [
L. Yang, A. Shami, LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles, in: 2022 IEEE Glob. Commun. Conf., 2022, pp. 1–6.
] to identify various types of known cyber-attacks. It is developed by identifying the best-performing ML model among three advanced ML algorithms (XGBoost, LightGBM, and CatBoost) for each attack class or type. The class leader models and their prediction confidence values are then used to make accurate decisions about the detection of distinct cyberattack types. The main advantage/improvement of the proposed LCCDE framework is that it can achieve the highest performance on all the classes (all types of attack detection) in the datasets among the base models. Thus, its overall performance can be improved.
3.
MTH_IDS_IoTJ.ipynb: This code is the implementation of a comprehensive IDS named the MTH-IDS [
]. It detects both known and unknown attacks by combining a signature-based IDS with an anomaly-based IDS. The signature-based IDS is created by expanding the tree-based IDS model by using Bayesian Optimization (BO), an intelligent HPO approach, to tune the hyperparameters of ML models and generate optimized ML models. On the other hand, the anomaly-based IDS is developed by proposing a Cluster Labeling (CL) k-means method and biased classifiers to distinguish unknown attacks from normal activities, and their performance is improved by tuning their hyperparameters with BO. By implementing the comprehensive MTH-IDS framework, both known and zero-day attacks can be detected effectively.
Additionally, the code repository introduces the public code of a Transfer Learning-Convolutional Neural Network (TL-CNN) IDS [
L. Yang, A. Shami, A Transfer Learning and Optimized CNN Based Intrusion Detection System for Internet of Vehicles, in: 2022 IEEE Int. Conf. Commun., 2022, pp. 1–6, http://dx.doi.org/10.1109/ICC45855.2022.9838780.
L. Yang, A. Shami, A Transfer Learning and Optimized CNN Based Intrusion Detection System for Internet of Vehicles, in: 2022 IEEE Int. Conf. Commun., 2022, pp. 1–6, http://dx.doi.org/10.1109/ICC45855.2022.9838780.
], it implements an intelligent IDS to develop an improved IDS framework using TL and optimized CNN techniques. Specifically, it employs TL techniques by transferring four cutting-edge CNN models, including VGG16, VGG19, Xception, Inception, and InceptionResnet [
], to the intrusion detection tasks by transforming network traffic data into pictures. Consequently, a novel mechanism for data transformation is also presented. In addition, it employs Particle Swarm Optimization (PSO) [
], a robust HPO technique, to automatically modify the hyperparameters of CNN models in order to get optimized CNN models. Lastly, the fundamental CNN models are combined using two ensemble procedures, confidence averaging and concatenation, to boost the intrusion detection performance. The HPO code repository [
], which has received more than 1000 GitHub stars, introduces the general HPO techniques that can be used to tune the hyperparameters of common ML models to optimize their performance. Details of each algorithm in all code can be found in [
L. Yang, A. Shami, A Transfer Learning and Optimized CNN Based Intrusion Detection System for Internet of Vehicles, in: 2022 IEEE Int. Conf. Commun., 2022, pp. 1–6, http://dx.doi.org/10.1109/ICC45855.2022.9838780.
], is used to evaluate the proposed IDS frameworks in the software. It is a cutting-edge dataset for network security that contains the most current attack patterns. The CICIDS2017 dataset contains various types of cyber-attacks, including Denial of Service (DoS) attacks, port-scan attacks, brute-force attacks, web attacks, botnets, and infiltration attacks.
3. Software impacts
Cybersecurity is an essential challenge in the current and future generations of networks. Although there are many existing papers for IDS development, the public and complete code for ML-based IDSs is limited. IDS-ML’s source code and datasets are made available to the general public under the MIT license to facilitate further study in this field. IDS-ML is an innovative and practical project that fills the gap of open source intrusion detection system development.
As the code is publicly available, many researchers and network data analysts have reproduced and used this code in their projects or tasks. Currently, it has received 135 stars and 23 forks on GitHub. Additionally, the corresponding papers of the code have received more than 120 citations. Reproducibility and transparency are two other advantages of this software, which are important for general ML and big data analytics projects to improve the general public’s interest and trust. It is expected to attract broader attention and usage in the near future. The telecommunications industries can also design IDSs with this code to protect their networks.
Another strength of the software is that it is completely written in Python, a programming language with an easy-to-understand syntax that has been widely employed in recent ML-related development projects. The flexibility of Python enables the proposed software to be reused, extended, and integrated with various other libraries in the intrusion detection field.
From the technical perspective, most existing IDS code repositories and software are developed based on traditional and basic ML or DL algorithms. The IDS-ML repository improves the existing IDS research by introducing many advanced techniques, such as ensemble learning, transfer learning, and hyperparameter optimization. Through these techniques, the detection accuracy and efficiency of existing IDSs can be significantly improved. Therefore, network researchers and administrators can benefit from the IDS-ML software by learning advanced techniques to improve their IDSs. With the wider application of effective IDSs driven by this IDS-ML repository, cyberattacks in the next generation of networks can be better addressed to enhance cybersecurity.
Lastly, in addition to network users, the ML techniques used in the IDS-ML code repository can be used as generic models to solve general classification problems [
], such as image classification, disease diagnostics, user behavior recognition, etc. Thus, general ML researchers and data analysts can benefit from this software.
4. Conclusions and future research directions
Cyber attacks are becoming more damaging and sophisticated. Detecting different types of attacks and understanding their patterns are crucial procedures in network security frameworks. The IDS-ML code repository provides easy-to-use IDS frameworks to apply traditional and advanced ML techniques to the state-of-the-art network traffic dataset for intrusion detection in modern networks. Network and cybersecurity researchers can take advantage of this code due to its easy implementation and clear explanation.
This research project can be extended and improved in two primary research directions. Firstly, the zero-day attack detection performance still has much room for improvement, as it is still an unsolved issue. Advanced unsupervised anomaly detection techniques and online adaptive approaches, such as Extreme Gradient Boosting Outlier Detection (XGBOD) and Performance Weighted Probability Averaging Ensemble (PWPAE), are promising solutions to improve zero-day attack detection performance. Secondly, as 6G networks are expected to be zero-touch networks that enable fully autonomous attack detection and recovery, Automated ML (AutoML) techniques should be deployed to realize automated intrusion detection. Although in IDS-ML, we have used HPO, an important procedure of AutoML, to automatically optimize ML models, there are still many other AutoML procedures that are worth exploring, such as automated data collection, automated data pre-processing, automated feature engineering, automated model selection, and automated model updating/concept drift adaptation.
CRediT authorship contribution statement
Li Yang: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Visualization. Abdallah Shami: Conceptualization, Resources, Writing – review & editing, Supervision, Project administration, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is partially supported by The Canadian Urban Transit Research & Innovation Consortium (CUTRIC)
. The authors thank Abdallah Moubayed, Ismail Hamieh, Gary Stevens, and Stephen DeRusett for their support in the original papers.
References
Khan S.
Sivaraman E.
Honnavalli P.B.
Performance evaluation of advanced machine learning algorithms for network intrusion detection system.
in: Dutta M. Krishna C.R. Kumar R. Kalra M. Proc. Int. Conf. IoT Incl. Life, ICIIL 2019Springer Singapore, Singapore,
NITTTR Chandigarh, India2020: 51-59
M.M. Leonardo, T.J. Carvalho, E. Rezende, R. Zucchi, F.A. Faria, Deep Feature-Based Classifiers for Fruit Fly Identification (Diptera: Tephritidae), in: 2018 31st SIBGRAPI Conf. Graph. Patterns Images, 2018, pp. 41–47, http://dx.doi.org/10.1109/SIBGRAPI.2018.00012.
L. Yang, A. Moubayed, I. Hamieh, A. Shami, Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles, in: 2019 IEEE Glob. Commun. Conf., 2019, pp. 1–6, http://dx.doi.org/10.1109/GLOBECOM38437.2019.9013892.
L. Yang, A. Shami, LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles, in: 2022 IEEE Glob. Commun. Conf., 2022, pp. 1–6.
L. Yang, A. Shami, A Transfer Learning and Optimized CNN Based Intrusion Detection System for Internet of Vehicles, in: 2022 IEEE Int. Conf. Commun., 2022, pp. 1–6, http://dx.doi.org/10.1109/ICC45855.2022.9838780.