Learning to Detect: A Data-driven Approach for Network Intrusion Detection

92 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Kai Zhang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zachary Tauscher - Yushan Jiang - Kai Zhang

التشفير والأمن التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

With massive data being generated daily and the ever-increasing interconnectivity of the worlds Internet infrastructures, a machine learning based intrusion detection system (IDS) has become a vital component to protect our economic and national security. In this paper, we perform a comprehensive study on NSL-KDD, a network traffic dataset, by visualizing patterns and employing different learning-based models to detect cyber attacks. Unlike previous shallow learning and deep learning models that use the single learning model approach for intrusion detection, we adopt a hierarchy strategy, in which the intrusion and normal behavior are classified firstly, and then the specific types of attacks are classified. We demonstrate the advantage of the unsupervised representation learning model in binary intrusion detection tasks. Besides, we alleviate the data imbalance problem with SVM-SMOTE oversampling technique in 4-class classification and further demonstrate the effectiveness and the drawback of the oversampling mechanism with a deep neural network as a base model.

قيم البحث

68 - Ahmed Shafee , Mohamed Baza , Douglas A. Talbert 2019

Purveyors of malicious network attacks continue to increase the complexity and the sophistication of their techniques, and their ability to evade detection continues to improve as well. Hence, intrusion detection systems must also evolve to meet thes e increasingly challenging threats. Machine learning is often used to support this needed improvement. However, training a good prediction model can require a large set of labelled training data. Such datasets are difficult to obtain because privacy concerns prevent the majority of intrusion detection agencies from sharing their sensitive data. In this paper, we propose the use of mimic learning to enable the transfer of intrusion detection knowledge through a teacher model trained on private data to a student model. This student model provides a mean of publicly sharing knowledge extracted from private data without sharing the data itself. Our results confirm that the proposed scheme can produce a student intrusion detection model that mimics the teacher model without requiring access to the original dataset.

التشفير والأمن التعلم الآلي التعلم الالي

On the Evaluation of Sequential Machine Learning for Network Intrusion Detection

268 - Andrea Corsini , Shanchieh Jay Yang , Giovanni Apruzzese 2021

Recent advances in deep learning renewed the research interests in machine learning for Network Intrusion Detection Systems (NIDS). Specifically, attention has been given to sequential learning models, due to their ability to extract the temporal cha racteristics of Network traffic Flows (NetFlows), and use them for NIDS tasks. However, the applications of these sequential models often consist of transferring and adapting methodologies directly from other fields, without an in-depth investigation on how to leverage the specific circumstances of cybersecurity scenarios; moreover, there is a lack of comprehensive studies on sequential models that rely on NetFlow data, which presents significant advantages over traditional full packet captures. We tackle this problem in this paper. We propose a detailed methodology to extract temporal sequences of NetFlows that denote patterns of malicious activities. Then, we apply this methodology to compare the efficacy of sequential learning models against traditional static learning models. In particular, we perform a fair comparison of a `sequential Long Short-Term Memory (LSTM) against a `static Feedforward Neural Networks (FNN) in distinct environments represented by two well-known datasets for NIDS: the CICIDS2017 and the CTU13. Our results highlight that LSTM achieves comparable performance to FNN in the CICIDS2017 with over 99.5% F1-score; while obtaining superior performance in the CTU13, with 95.7% F1-score against 91.5%. This paper thus paves the way to future applications of sequential learning models for NIDS.

التشفير والأمن التعلم الآلي

A Case Study on Using Deep Learning for Network Intrusion Detection

63 - Gabriel C. Fernandez , Shouhuai Xu 2019

Deep Learning has been very successful in many application domains. However, its usefulness in the context of network intrusion detection has not been systematically investigated. In this paper, we report a case study on using deep learning for both supervised network intrusion detection and unsupervised network anomaly detection. We show that Deep Neural Networks (DNNs) can outperform other machine learning based intrusion detection systems, while being robust in the presence of dynamic IP addresses. We also show that Autoencoders can be effective for network anomaly detection.

التشفير والأمن

A Low-Cost Machine Learning Based Network Intrusion Detection System with Data Privacy Preservation

82 - Jyoti Fakirah , Lauhim Mahfuz Zishan , Roshni Mooruth 2021

Network intrusion is a well-studied area of cyber security. Current machine learning-based network intrusion detection systems (NIDSs) monitor network data and the patterns within those data but at the cost of presenting significant issues in terms o f privacy violations which may threaten end-user privacy. Therefore, to mitigate risk and preserve a balance between security and privacy, it is imperative to protect user privacy with respect to intrusion data. Moreover, cost is a driver of a machine learning-based NIDS because such systems are increasingly being deployed on resource-limited edge devices. To solve these issues, in this paper we propose a NIDS called PCC-LSM-NIDS that is composed of a Pearson Correlation Coefficient (PCC) based feature selection algorithm and a Least Square Method (LSM) based privacy-preserving algorithm to achieve low-cost intrusion detection while providing privacy preservation for sensitive data. The proposed PCC-LSM-NIDS is tested on the benchmark intrusion database UNSW-NB15, using five popular classifiers. The experimental results show that the proposed PCC-LSM-NIDS offers advantages in terms of less computational time, while offering an appropriate degree of privacy protection.

التشفير والأمن

Statistical Analysis Driven Optimized Deep Learning System for Intrusion Detection

126 - Cosimo Ieracitano , Ahsan Adeel , Mandar Gogate 2018

Attackers have developed ever more sophisticated and intelligent ways to hack information and communication technology systems. The extent of damage an individual hacker can carry out upon infiltrating a system is well understood. A potentially catas trophic scenario can be envisaged where a nation-state intercepting encrypted financial data gets hacked. Thus, intelligent cybersecurity systems have become inevitably important for improved protection against malicious threats. However, as malware attacks continue to dramatically increase in volume and complexity, it has become ever more challenging for traditional analytic tools to detect and mitigate threat. Furthermore, a huge amount of data produced by large networks has made the recognition task even more complicated and challenging. In this work, we propose an innovative statistical analysis driven optimized deep learning system for intrusion detection. The proposed intrusion detection system (IDS) extracts optimized and more correlated features using big data visualization and statistical analysis methods (human-in-the-loop), followed by a deep autoencoder for potential threat detection. Specifically, a pre-processing module eliminates the outliers and converts categorical variables into one-hot-encoded vectors. The feature extraction module discard features with null values and selects the most significant features as input to the deep autoencoder model (trained in a greedy-wise manner). The NSL-KDD dataset from the Canadian Institute for Cybersecurity is used as a benchmark to evaluate the feasibility and effectiveness of the proposed architecture. Simulation results demonstrate the potential of our proposed system and its outperformance as compared to existing state-of-the-art methods and recently published novel approaches. Ongoing work includes further optimization and real-time evaluation of our proposed IDS.

التشفير والأمن