Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

106 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sahil Suneja

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yufan Zhuang - Sahil Suneja - Veronika Thost

الذكاء الاصطناعي هندسة البرمجيات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program, in order to improve prediction performance. Compared with a generic GNN, our enhancements include a synthesis of multiple representations learned from the several parsed graphs of a program, and a new training loss metric that leverages the fine granularity of labeling. Our model outperforms multiple text, image and graph-based approaches, across two real-world datasets.

قيم البحث

169 - Weining Zheng , Yuan Jiang , Xiaohong Su 2021

Vulnerability detection is an important issue in software security. Although various data-driven vulnerability detection methods have been proposed, the task remains challenging since the diversity and complexity of real-world vulnerable code in synt ax and semantics make it difficult to extract vulnerable features with regular deep learning models, especially in analyzing a large program. Moreover, the fact that real-world vulnerable codes contain a lot of redundant information unrelated to vulnerabilities will further aggravate the above problem. To mitigate such challenges, we define a novel code representation named Slice Property Graph (SPG), and then propose VulSPG, a new vulnerability detection approach using the improved R-GCN model with triple attention mechanism to identify potential vulnerabilities in SPG. Our approach has at least two advantages over other methods. First, our proposed SPG can reflect the rich semantics and explicit structural information that may be relevance to vulnerabilities, while eliminating as much irrelevant information as possible to reduce the complexity of graph. Second, VulSPG incorporates triple attention mechanism in R-GCNs to achieve more effective learning of vulnerability patterns from SPG. We have extensively evaluated VulSPG on two large-scale datasets with programs from SARD and real-world projects. Experimental results prove the effectiveness and efficiency of VulSPG.

التشفير والأمن هندسة البرمجيات

Learning to map source code to software vulnerability using code-as-a-graph

133 - Sahil Suneja , Yunhui Zheng , Yufan Zhuang 2020

We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of rel ationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct 2019, Paper #28, ICST)

هندسة البرمجيات التشفير والأمن التعلم الآلي

Dynamic Vulnerability Detection on Smart Contracts Using Machine Learning

113 - Mojtaba Eshghie , Cyrille Artho , Dilian Gurov 2021

In this work we propose Dynamit, a monitoring framework to detect reentrancy vulnerabilities in Ethereum smart contracts. The novelty of our framework is that it relies only on transaction metadata and balance data from the blockchain system; our app roach requires no domain knowledge, code instrumentation, or special execution environment. Dynamit extracts features from transaction data and uses a machine learning model to classify transactions as benign or harmful. Therefore, not only can we find the contracts that are vulnerable to reentrancy attacks, but we also get an execution trace that reproduces the attack.

التشفير والأمن هندسة البرمجيات

Deep Contrastive Graph Representation via Adaptive Homotopy Learning

198 - Rui Zhang , Chengjun Lu , Ziheng Jiao 2021

Homotopy model is an excellent tool exploited by diverse research works in the field of machine learning. However, its flexibility is limited due to lack of adaptiveness, i.e., manual fixing or tuning the appropriate homotopy coefficients. To address the problem above, we propose a novel adaptive homotopy framework (AH) in which the Maclaurin duality is employed, such that the homotopy parameters can be adaptively obtained. Accordingly, the proposed AH can be widely utilized to enhance the homotopy-based algorithm. In particular, in this paper, we apply AH to contrastive learning (AHCL) such that it can be effectively transferred from weak-supervised learning (given label priori) to unsupervised learning, where soft labels of contrastive learning are directly and adaptively learned. Accordingly, AHCL has the adaptive ability to extract deep features without any sort of prior information. Consequently, the affinity matrix formulated by the related adaptive labels can be constructed as the deep Laplacian graph that incorporates the topology of deep representations for the inputs. Eventually, extensive experiments on benchmark datasets validate the superiority of our method.

الرؤية الحاسوبية وتمييز الأنماط

Deep Graph Matching and Searching for Semantic Code Retrieval

81 - Xiang Ling , Lingfei Wu , Saizhuo Wang 2020

Code retrieval is to find the code snippet from a large corpus of source code repositories that highly matches the query of natural language description. Recent work mainly uses natural language processing techniques to process both query texts (i.e. , human natural language) and code snippets (i.e., machine programming language), however neglecting the deep structured features of query texts and source codes, both of which contain rich semantic information. In this paper, we propose an end-to-end deep graph matching and searching (DGMS) model based on graph neural networks for the task of semantic code retrieval. To this end, we first represent both natural language query texts and programming language code snippets with the unified graph-structured data, and then use the proposed graph matching and searching model to retrieve the best matching code snippet. In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them by cross-attention based semantic matching operations. We evaluate the proposed DGMS model on two public code retrieval datasets with two representative programming languages (i.e., Java and Python). Experiment results demonstrate that DGMS significantly outperforms state-of-the-art baseline models by a large margin on both datasets. Moreover, our extensive ablation studies systematically investigate and illustrate the impact of each part of DGMS.

الذكاء الاصطناعي استرجاع المعلومات هندسة البرمجيات