ترغب بنشر مسار تعليمي؟ اضغط هنا

Data Analytics-enabled Intrusion Detection: Evaluations of ToN_IoT Linux Datasets

116   0   0.0 ( 0 )
 نشر من قبل Nour Moustafa
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

With the widespread of Artificial Intelligence (AI)- enabled security applications, there is a need for collecting heterogeneous and scalable data sources for effectively evaluating the performances of security applications. This paper presents the description of new datasets, named ToN IoT datasets that include distributed data sources collected from Telemetry datasets of Internet of Things (IoT) services, Operating systems datasets of Windows and Linux, and datasets of Network traffic. The paper aims to describe the new testbed architecture used to collect Linux datasets from audit traces of hard disk, memory and process. The architecture was designed in three distributed layers of edge, fog, and cloud. The edge layer comprises IoT and network systems, the fog layer includes virtual machines and gateways, and the cloud layer includes data analytics and visualization tools connected with the other two layers. The layers were programmatically controlled using Software-Defined Network (SDN) and Network-Function Virtualization (NFV) using the VMware NSX and vCloud NFV platform. The Linux ToN IoT datasets would be used to train and validate various new federated and distributed AI-enabled security solutions such as intrusion detection, threat intelligence, privacy preservation and digital forensics. Various Data analytical and machine learning methods are employed to determine the fidelity of the datasets in terms of examining feature engineering, statistics of legitimate and security events, and reliability of security events. The datasets can be publicly accessed from [1].



قيم البحث

اقرأ أيضاً

Existing cyber security solutions have been basically developed using knowledge-based models that often cannot trigger new cyber-attack families. With the boom of Artificial Intelligence (AI), especially Deep Learning (DL) algorithms, those security solutions have been plugged-in with AI models to discover, trace, mitigate or respond to incidents of new security events. The algorithms demand a large number of heterogeneous data sources to train and validate new security systems. This paper presents the description of new datasets, the so-called ToN_IoT, which involve federated data sources collected from telemetry datasets of IoT services, operating system datasets of Windows and Linux, and datasets of network traffic. The paper introduces the testbed and description of TON_IoT datasets for Windows operating systems. The testbed was implemented in three layers: edge, fog and cloud. The edge layer involves IoT and network devices, the fog layer contains virtual machines and gateways, and the cloud layer involves cloud services, such as data analytics, linked to the other two layers. These layers were dynamically managed using the platforms of software-Defined Network (SDN) and Network-Function Virtualization (NFV) using the VMware NSX and vCloud NFV platform. The Windows datasets were collected from audit traces of memories, processors, networks, processes and hard disks. The datasets would be used to evaluate various AI-based cyber security solutions, including intrusion detection, threat intelligence and hunting, privacy preservation and digital forensics. This is because the datasets have a wide range of recent normal and attack features and observations, as well as authentic ground truth events. The datasets can be publicly accessed from this link [1].
Modern vehicles rely on scores of electronic control units (ECUs) broadcasting messages over a few controller area networks (CANs). Bereft of security features, in-vehicle CANs are exposed to cyber manipulation and multiple researches have proved via ble, life-threatening cyber attacks. Complicating the issue, CAN messages lack a common mapping of functions to commands, so packets are observable but not easily decipherable. We present a transformational approach to CAN IDS that exploits the geometric properties of CAN data to inform two novel detectors--one based on distance from a learned, lower dimensional manifold and the other on discontinuities of the manifold over time. Proof-of-concept tests are presented by implementing a potential attack approach on a driving vehicle. The initial results suggest that (1) the first detector requires additional refinement but does hold promise; (2) the second detector gives a clear, strong indicator of the attack; and (3) the algorithms keep pace with high-speed CAN messages. As our approach is data-driven it provides a vehicle-agnostic IDS that eliminates the need to reverse engineer CAN messages and can be ported to an after-market plugin.
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation. The exponential expansion in the deployment of cloud technology has produced a massive amount of data from a variety of applica tions, resources and platforms. In turn, the rapid rate and volume of data creation has begun to pose significant challenges for data management and security. The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance. In this paper, we conduct a systematic literature review (SLR) of data mining techniques (DMT) used in IDS-based solutions through the period 2013-2018. We employed criterion-based, purposive sampling identifying 32 articles, which constitute the primary source of the present survey. After a careful investigation of these articles, we identified 17 separate DMTs deployed in an IDS context. This paper also presents the merits and disadvantages of the various works of current research that implemented DMTs and distributed streaming frameworks (DSF) to detect and/or prevent malicious attacks in a big data environment.
With massive data being generated daily and the ever-increasing interconnectivity of the worlds Internet infrastructures, a machine learning based intrusion detection system (IDS) has become a vital component to protect our economic and national secu rity. In this paper, we perform a comprehensive study on NSL-KDD, a network traffic dataset, by visualizing patterns and employing different learning-based models to detect cyber attacks. Unlike previous shallow learning and deep learning models that use the single learning model approach for intrusion detection, we adopt a hierarchy strategy, in which the intrusion and normal behavior are classified firstly, and then the specific types of attacks are classified. We demonstrate the advantage of the unsupervised representation learning model in binary intrusion detection tasks. Besides, we alleviate the data imbalance problem with SVM-SMOTE oversampling technique in 4-class classification and further demonstrate the effectiveness and the drawback of the oversampling mechanism with a deep neural network as a base model.
Prevention of cyber attacks on the critical network resources has become an important issue as the traditional Intrusion Detection Systems (IDSs) are no longer effective due to the high volume of network traffic and the deceptive patterns of network usage employed by the attackers. Lack of sufficient amount of labeled observations for the training of IDSs makes the semi-supervised IDSs a preferred choice. We propose a semi-supervised IDS by extending a data analysis technique known as Logical Analysis of Data, or LAD in short, which was proposed as a supervised learning approach. LAD uses partially defined Boolean functions (pdBf) and their extensions to find the positive and the negative patterns from the past observations for classification of future observations. We extend the LAD to make it semi-supervised to design an IDS. The proposed SSIDS consists of two phases: offline and online. The offline phase builds the classifier by identifying the behavior patterns of normal and abnormal network usage. Later, these patterns are transformed into rules for classification and the rules are used during the online phase for the detection of abnormal network behaviors. The performance of the proposed SSIDS is far better than the existing semi-supervised IDSs and comparable with the supervised IDSs as evident from the experimental results.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا