ترغب بنشر مسار تعليمي؟ اضغط هنا

A Feature-Oriented Corpus for Understanding, Evaluating and Improving Fuzz Testing

96   0   0.0 ( 0 )
 نشر من قبل Xiaogang Zhu
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Fuzzing is a promising technique for detecting security vulnerabilities. Newly developed fuzzers are typically evaluated in terms of the number of bugs found on vulnerable programs/binaries. However,existing corpora usually do not capture the features that prevent fuzzers from finding bugs, leading to ambiguous conclusions on the pros and cons of the fuzzers evaluated. A typical example is that Driller detects more bugs than AFL, but its evaluation cannot establish if the advancement of Driller stems from the concolic execution or not, since, for example, its ability in resolving a dataset`s magic values is unclear. In this paper, we propose to address the above problem by generating corpora based on search-hampering features. As a proof-of-concept, we have designed FEData, a prototype corpus that currently focuses on four search-hampering features to generate vulnerable programs for fuzz testing. Unlike existing corpora that can only answer how, FEData can also further answer why by exposing (or understanding) the reasons for the identified weaknesses in a fuzzer. The why information serves as the key to the improvement of fuzzers.

قيم البحث

اقرأ أيضاً

63 - Ying Fu , Meng Ren , Fuchen Ma 2019
Ethereum Virtual Machine (EVM) is the run-time environment for smart contracts and its vulnerabilities may lead to serious problems to the Ethereum ecology. With lots of techniques being developed for the validation of smart contracts, the security p roblems of EVM have not been well-studied. In this paper, we propose EVMFuzz, aiming to detect vulnerabilities of EVMs with differential fuzz testing. The core idea of EVMFuzz is to continuously generate seed contracts for different EVMs execution, so as to find as many inconsistencies among execution results as possible, eventually discover vulnerabilities with output cross-referencing. First, we present the evaluation metric for the internal inconsistency indicator, such as the opcode sequence executed and gas used. Then, we construct seed contracts via a set of predefined mutators and employ dynamic priority scheduling algorithm to guide seed contracts selection and maximize the inconsistency. Finally, we leverage different EVMs as crossreferencing oracles to avoid manual checking of the execution output. For evaluation, we conducted large-scale mutation on 36,295 real-world smart contracts and generated 253,153 smart contracts. Among them, 66.2% showed differential performance, including 1,596 variant contracts triggered inconsistent output among EVMs. Accompanied by manual root cause analysis, we found 5 previously unknown security bugs in four widely used EVMs, and all had been included in Common Vulnerabilities and Exposures (CVE) database.
In the field of mutation analysis, mutation is the systematic generation of mutated programs (i.e., mutants) from an original program. The concept of mutation has been widely applied to various testing problems, including test set selection, fault lo calization, and program repair. However, surprisingly little focus has been given to the theoretical foundation of mutation-based testing methods, making it difficult to understand, organize, and describe various mutation-based testing methods. This paper aims to consider a theoretical framework for understanding mutation-based testing methods. While there is a solid testing framework for general testing, this is incongruent with mutation-based testing methods, because it focuses on the correctness of a program for a test, while the essence of mutation-based testing concerns the differences between programs (including mutants) for a test. In this paper, we begin the construction of our framework by defining a novel testing factor, called a test differentiator, to transform the paradigm of testing from the notion of correctness to the notion of difference. We formally define behavioral differences of programs for a set of tests as a mathematical vector, called a d-vector. We explore the multi-dimensional space represented by d-vectors, and provide a graphical model for describing the space. Based on our framework and formalization, we interpret existing mutation-based fault localization methods and mutant set minimization as applications, and identify novel implications for future work.
Recently, there has been a significant growth of interest in applying software engineering techniques for the quality assurance of deep learning (DL) systems. One popular direction is deep learning testing, where adversarial examples (a.k.a.~bugs) of DL systems are found either by fuzzing or guided search with the help of certain testing metrics. However, recent studies have revealed that the commonly used neuron coverage metrics by existing DL testing approaches are not correlated to model robustness. It is also not an effective measurement on the confidence of the model robustness after testing. In this work, we address this gap by proposing a novel testing framework called Robustness-Oriented Testing (RobOT). A key part of RobOT is a quantitative measurement on 1) the value of each test case in improving model robustness (often via retraining), and 2) the convergence quality of the model robustness improvement. RobOT utilizes the proposed metric to automatically generate test cases valuable for improving model robustness. The proposed metric is also a strong indicator on how well robustness improvement has converged through testing. Experiments on multiple benchmark datasets confirm the effectiveness and efficiency of RobOT in improving DL model robustness, with 67.02% increase on the adversarial robustness that is 50.65% higher than the state-of-the-art work DeepGini.
80 - Yixiao Yang 2020
In the field of software engineering, applying language models to the token sequence of source code is the state-of-art approach to build a code recommendation system. The syntax tree of source code has hierarchical structures. Ignoring the character istics of tree structures decreases the model performance. Current LSTM model handles sequential data. The performance of LSTM model will decrease sharply if the noise unseen data is distributed everywhere in the test suite. As code has free naming conventions, it is common for a model trained on one project to encounter many unknown words on another project. If we set many unseen words as UNK just like the solution in natural language processing, the number of UNK will be much greater than the sum of the most frequently appeared words. In an extreme case, just predicting UNK at everywhere may achieve very high prediction accuracy. Thus, such solution cannot reflect the true performance of a model when encountering noise unseen data. In this paper, we only mark a small number of rare words as UNK and show the prediction performance of models under in-project and cross-project evaluation. We propose a novel Hierarchical Language Model (HLM) to improve the robustness of LSTM model to gain the capacity about dealing with the inconsistency of data distribution between training and testing. The newly proposed HLM takes the hierarchical structure of code tree into consideration to predict code. HLM uses BiLSTM to generate embedding for sub-trees according to hierarchies and collects the embedding of sub-trees in context to predict next code. The experiments on inner-project and cross-project data sets indicate that the newly proposed Hierarchical Language Model (HLM) performs better than the state-of-art LSTM model in dealing with the data inconsistency between training and testing and achieves averagely 11.2% improvement in prediction accuracy.
We present ASDiv (Academia Sinica Diverse MWP Dataset), a diverse (in terms of both language patterns and problem types) English math word problem (MWP) corpus for evaluating the capability of various MWP solvers. Existing MWP corpora for studying AI progress remain limited either in language usage patterns or in problem types. We thus present a new English MWP corpus with 2,305 MWPs that cover more text patterns and most problem types taught in elementary school. Each MWP is annotated with its problem type and grade level (for indicating the level of difficulty). Furthermore, we propose a metric to measure the lexicon usage diversity of a given MWP corpus, and demonstrate that ASDiv is more diverse than existing corpora. Experiments show that our proposed corpus reflects the true capability of MWP solvers more faithfully.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا