No Arabic abstract
In this paper, we present three datasets that have been built from network traffic traces using ASNM features, designed in our previous work. The first dataset was built using a state-of-the-art dataset called CDX 2009, while the remaining two datasets were collected by us in 2015 and 2018, respectively. These two datasets contain several adversarial obfuscation techniques that were applied onto malicious as well as legitimate traffic samples during the execution of particular TCP network connections. Adversarial obfuscation techniques were used for evading machine learning-based network intrusion detection classifiers. Further, we showed that the performance of such classifiers can be improved when partially augmenting their training data by samples obtained from obfuscation techniques. In detail, we utilized tunneling obfuscation in HTTP(S) protocol and non-payload-based obfuscations modifying various properties of network traffic by, e.g., TCP segmentation, re-transmissions, corrupting and reordering of packets, etc. To the best of our knowledge, this is the first collection of network traffic metadata that contains adversarial techniques and is intended for non-payload-based network intrusion detection and adversarial classification. Provided datasets enable testing of the evasion resistance of arbitrary classifier that is using ASNM features.
Machine learning (ML), especially deep learning (DL) techniques have been increasingly used in anomaly-based network intrusion detection systems (NIDS). However, ML/DL has shown to be extremely vulnerable to adversarial attacks, especially in such security-sensitive systems. Many adversarial attacks have been proposed to evaluate the robustness of ML-based NIDSs. Unfortunately, existing attacks mostly focused on feature-space and/or white-box attacks, which make impractical assumptions in real-world scenarios, leaving the study on practical gray/black-box attacks largely unexplored. To bridge this gap, we conduct the first systematic study of the gray/black-box traffic-space adversarial attacks to evaluate the robustness of ML-based NIDSs. Our work outperforms previous ones in the following aspects: (i) practical-the proposed attack can automatically mutate original traffic with extremely limited knowledge and affordable overhead while preserving its functionality; (ii) generic-the proposed attack is effective for evaluating the robustness of various NIDSs using diverse ML/DL models and non-payload-based features; (iii) explainable-we propose an explanation method for the fragile robustness of ML-based NIDSs. Based on this, we also propose a defense scheme against adversarial attacks to improve system robustness. We extensively evaluate the robustness of various NIDSs using diverse feature sets and ML/DL models. Experimental results show our attack is effective (e.g., >97% evasion rate in half cases for Kitsune, a state-of-the-art NIDS) with affordable execution cost and the proposed defense method can effectively mitigate such attacks (evasion rate is reduced by >50% in most cases).
In this work, we show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan. AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of the model. We leverage adversarial noise in the input space to move Trojan-infected examples across the model decision boundary, making it difficult to detect. The stealthiness behavior of AdvTrojan fools the users into accidentally trust the infected model as a robust classifier against adversarial examples. AdvTrojan can be implemented by only poisoning the training data similar to conventional Trojan backdoor attacks. Our thorough analysis and extensive experiments on several benchmark datasets show that AdvTrojan can bypass existing defenses with a success rate close to 100% in most of our experimental scenarios and can be extended to attack federated learning tasks as well.
The increase of cyber attacks in both the numbers and varieties in recent years demands to build a more sophisticated network intrusion detection system (NIDS). These NIDS perform better when they can monitor all the traffic traversing through the network like when being deployed on a Software-Defined Network (SDN). Because of the inability to detect zero-day attacks, signature-based NIDS which were traditionally used for detecting malicious traffic are beginning to get replaced by anomaly-based NIDS built on neural networks. However, recently it has been shown that such NIDS have their own drawback namely being vulnerable to the adversarial example attack. Moreover, they were mostly evaluated on the old datasets which dont represent the variety of attacks network systems might face these days. In this paper, we present Reconstruction from Partial Observation (RePO) as a new mechanism to build an NIDS with the help of denoising autoencoders capable of detecting different types of network attacks in a low false alert setting with an enhanced robustness against adversarial example attack. Our evaluation conducted on a dataset with a variety of network attacks shows denoising autoencoders can improve detection of malicious traffic by up to 29% in a normal setting and by up to 45% in an adversarial setting compared to other recently proposed anomaly detectors.
Machine-learning based intrusion detection classifiers are able to detect unknown attacks, but at the same time, they may be susceptible to evasion by obfuscation techniques. An adversary intruder which possesses a crucial knowledge about a protection system can easily bypass the detection module. The main objective of our work is to improve the performance capabilities of intrusion detection classifiers against such adversaries. To this end, we firstly propose several obfuscation techniques of remote attacks that are based on the modification of various properties of network connections; then we conduct a set of comprehensive experiments to evaluate the effectiveness of intrusion detection classifiers against obfuscated attacks. We instantiate our approach by means of a tool, based on NetEm and Metasploit, which implements our obfuscation operators on any TCP communication. This allows us to generate modified network traffic for machine learning experiments employing features for assessing network statistics and behavior of TCP connections. We perform the evaluation of five classifiers: Gaussian Naive Bayes, Gaussian Naive Bayes with kernel density estimation, Logistic Regression, Decision Tree, and Support Vector Machines. Our experiments confirm the assumption that it is possible to evade the intrusion detection capability of all classifiers trained without prior knowledge about obfuscated attacks, causing an exacerbation of the TPR ranging from 7.8% to 66.8%. Further, when widening the training knowledge of the classifiers by a subset of obfuscated attacks, we achieve a significant improvement of the TPR by 4.21% - 73.3%, while the FPR is deteriorated only slightly (0.1% - 1.48%). Finally, we test the capability of an obfuscations-aware classifier to detect unknown obfuscated attacks, where we achieve over 90% detection rate on average for most of the obfuscations.
With massive data being generated daily and the ever-increasing interconnectivity of the worlds Internet infrastructures, a machine learning based intrusion detection system (IDS) has become a vital component to protect our economic and national security. In this paper, we perform a comprehensive study on NSL-KDD, a network traffic dataset, by visualizing patterns and employing different learning-based models to detect cyber attacks. Unlike previous shallow learning and deep learning models that use the single learning model approach for intrusion detection, we adopt a hierarchy strategy, in which the intrusion and normal behavior are classified firstly, and then the specific types of attacks are classified. We demonstrate the advantage of the unsupervised representation learning model in binary intrusion detection tasks. Besides, we alleviate the data imbalance problem with SVM-SMOTE oversampling technique in 4-class classification and further demonstrate the effectiveness and the drawback of the oversampling mechanism with a deep neural network as a base model.