A comparative study of neural network techniques for automatic software vulnerability detection

62 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Gaigai Tang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Gaigai Tang - Lianxiao Meng - Shuangyin Ren

هندسة البرمجيات التشفير والأمن التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Software vulnerabilities are usually caused by design flaws or implementation errors, which could be exploited to cause damage to the security of the system. At present, the most commonly used method for detecting software vulnerabilities is static analysis. Most of the related technologies work based on rules or code similarity (source code level) and rely on manually defined vulnerability features. However, these rules and vulnerability features are difficult to be defined and designed accurately, which makes static analysis face many challenges in practical applications. To alleviate this problem, some researchers have proposed to use neural networks that have the ability of automatic feature extraction to improve the intelligence of detection. However, there are many types of neural networks, and different data preprocessing methods will have a significant impact on model performance. It is a great challenge for engineers and researchers to choose a proper neural network and data preprocessing method for a given problem. To solve this problem, we have conducted extensive experiments to test the performance of the two most typical neural networks (i.e., Bi-LSTM and RVFL) with the two most classical data preprocessing methods (i.e., the vector representation and the program symbolization methods) on software vulnerability detection problems and obtained a series of interesting research conclusions, which can provide valuable guidelines for researchers and engineers. Specifically, we found that 1) the training speed of RVFL is always faster than BiLSTM, but the prediction accuracy of Bi-LSTM model is higher than RVFL; 2) using doc2vec for vector representation can make the model have faster training speed and generalization ability than using word2vec; and 3) multi-level symbolization is helpful to improve the precision of neural network models.

قيم البحث

133 - Sahil Suneja , Yunhui Zheng , Yufan Zhuang 2020

We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of rel ationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct 2019, Paper #28, ICST)

هندسة البرمجيات التشفير والأمن التعلم الآلي

Automatic Generation of High-Coverage Tests for RTL Designs using Software Techniques and Tools

59 - Yu Zhang , Wenlong Feng , Mengxing Huang 2016

Register Transfer Level (RTL) design validation is a crucial stage in the hardware design process. We present a new approach to enhancing RTL design validation using available software techniques and tools. Our approach converts the source code of a RTL design into a C++ software program. Then a powerful symbolic execution engine is employed to execute the converted C++ program symbolically to generate test cases. To better generate efficient test cases, we limit the number of cycles to guide symbolic execution. Moreover, we add bit-level symbolic variable support into the symbolic execution engine. Generated test cases are further evaluated by simulating the RTL design to get accurate coverage. We have evaluated the approach on a floating point unit (FPU) design. The preliminary results show that our approach can deliver high-quality tests to achieve high coverage.

هندسة البرمجيات هندسة العتاد

A Comparative Study of AI-based Intrusion Detection Techniques in Critical Infrastructures

110 - Safa Otoum , Burak Kantarci , Hussein Mouftah 2020

Volunteer computing uses Internet-connected devices (laptops, PCs, smart devices, etc.), in which their owners volunteer them as storage and computing power resources, has become an essential mechanism for resource management in numerous applications . The growth of the volume and variety of data traffic in the Internet leads to concerns on the robustness of cyberphysical systems especially for critical infrastructures. Therefore, the implementation of an efficient Intrusion Detection System for gathering such sensory data has gained vital importance. In this paper, we present a comparative study of Artificial Intelligence (AI)-driven intrusion detection systems for wirelessly connected sensors that track crucial applications. Specifically, we present an in-depth analysis of the use of machine learning, deep learning and reinforcement learning solutions to recognize intrusive behavior in the collected traffic. We evaluate the proposed mechanisms by using KD99 as real attack data-set in our simulations. Results present the performance metrics for three different IDSs namely the Adaptively Supervised and Clustered Hybrid IDS (ASCH-IDS), Restricted Boltzmann Machine-based Clustered IDS (RBC-IDS) and Q-learning based IDS (QL-IDS) to detect malicious behaviors. We also present the performance of different reinforcement learning techniques such as State-Action-Reward-State-Action Learning (SARSA) and the Temporal Difference learning (TD). Through simulations, we show that QL-IDS performs with 100% detection rate while SARSA-IDS and TD-IDS perform at the order of 99.5%.

بنية الشبكات والإنترنت النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Doublade: Unknown Vulnerability Detection in Smart Contracts Via Abstract Signature Matching and Refined Detection Rules

88 - Yinxing Xue 2019

With the prosperity of smart contracts and the blockchain technology, various security analyzers have been proposed from both the academia and industry to address the associated risks. Yet, there does not exist a high-quality benchmark of smart contr act vulnerability for security research. In this study, we propose an approach towards building a high-quality vulnerability benchmark. Our approach consists of two parts. First, to improve recall, we propose to search for similar vulnerabilities in an automated way by leveraging the abstract vulnerability signature (AVS). Second, to remove the false positives (FPs) due to AVS-based matching, we summarize the detection rules of existing tools and apply the refined rules by considering various defense mechanisms (DMs). By integrating AVS-based code matching and the refined detection rules (RDR), our approach achieves higher precision and recall. On the collected 76,354 contracts, we build a benchmark consisting of 1,219 vulnerabilities covering five different vulnerability types identified together by our tool (DOUBLADE) and other three scanners. Additionally, we conduct a comparison between DOUBLADE and the others, on an additional 17,770 contracts. Results show that DOUBLADE can yield a better detection accuracy with similar execution time.

هندسة البرمجيات التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية

A Spatial-Temporal Graph Neural Network Framework for Automated Software Bug Triaging

79 - Hongrun Wu , Yutao Ma , Zhenglong Xiang 2021

The bug triaging process, an essential process of assigning bug reports to the most appropriate developers, is related closely to the quality and costs of software development. As manual bug assignment is a labor-intensive task, especially for large- scale software projects, many machine-learning-based approaches have been proposed to automatically triage bug reports. Although developer collaboration networks (DCNs) are dynamic and evolving in the real-world, most automated bug triaging approaches focus on static tossing graphs at a single time slice. Also, none of the previous studies consider periodic interactions among developers. To address the problems mentioned above, in this article, we propose a novel spatial-temporal dynamic graph neural network (ST-DGNN) framework, including a joint random walk (JRWalk) mechanism and a graph recurrent convolutional neural network (GRCNN) model. In particular, JRWalk aims to sample local topological structures in a graph with two sampling strategies by considering both node importance and edge importance. GRCNN has three components with the same structure, i.e., hourly-periodic, daily-periodic, and weekly-periodic components, to learn the spatial-temporal features of dynamic DCNs. We evaluated our approachs effectiveness by comparing it with several state-of-the-art graph representation learning methods in two domain-specific tasks that belong to node classification. In the two tasks, experiments on two real-world, large-scale developer collaboration networks collected from the Eclipse and Mozilla projects indicate that the proposed approach outperforms all the baseline methods.

هندسة البرمجيات الشبكات الاجتماعية والمعلومات