EVMFuzz: Differential Fuzz Testing of Ethereum Virtual Machine

64 0 0.0 ( 0 )

Download Cite

Added by Ying Fu

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Ying Fu - Meng Ren - Fuchen Ma

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Ethereum Virtual Machine (EVM) is the run-time environment for smart contracts and its vulnerabilities may lead to serious problems to the Ethereum ecology. With lots of techniques being developed for the validation of smart contracts, the security problems of EVM have not been well-studied. In this paper, we propose EVMFuzz, aiming to detect vulnerabilities of EVMs with differential fuzz testing. The core idea of EVMFuzz is to continuously generate seed contracts for different EVMs execution, so as to find as many inconsistencies among execution results as possible, eventually discover vulnerabilities with output cross-referencing. First, we present the evaluation metric for the internal inconsistency indicator, such as the opcode sequence executed and gas used. Then, we construct seed contracts via a set of predefined mutators and employ dynamic priority scheduling algorithm to guide seed contracts selection and maximize the inconsistency. Finally, we leverage different EVMs as crossreferencing oracles to avoid manual checking of the execution output. For evaluation, we conducted large-scale mutation on 36,295 real-world smart contracts and generated 253,153 smart contracts. Among them, 66.2% showed differential performance, including 1,596 variant contracts triggered inconsistent output among EVMs. Accompanied by manual root cause analysis, we found 5 previously unknown security bugs in four widely used EVMs, and all had been included in Common Vulnerabilities and Exposures (CVE) database.

rate research

A Feature-Oriented Corpus for Understanding, Evaluating and Improving Fuzz Testing

95 - Xiaogang Zhu , Xiaotao Feng , Tengyun Jiao 2019

Fuzzing is a promising technique for detecting security vulnerabilities. Newly developed fuzzers are typically evaluated in terms of the number of bugs found on vulnerable programs/binaries. However,existing corpora usually do not capture the features that prevent fuzzers from finding bugs, leading to ambiguous conclusions on the pros and cons of the fuzzers evaluated. A typical example is that Driller detects more bugs than AFL, but its evaluation cannot establish if the advancement of Driller stems from the concolic execution or not, since, for example, its ability in resolving a dataset`s magic values is unclear. In this paper, we propose to address the above problem by generating corpora based on search-hampering features. As a proof-of-concept, we have designed FEData, a prototype corpus that currently focuses on four search-hampering features to generate vulnerable programs for fuzz testing. Unlike existing corpora that can only answer how, FEData can also further answer why by exposing (or understanding) the reasons for the identified weaknesses in a fuzzer. The why information serves as the key to the improvement of fuzzers.

Software Engineering

MLCheck- Property-Driven Testing of Machine Learning Models

79 - Arnab Sharma , Caglar Demir , Axel-Cyrille Ngonga Ngomo 2021

In recent years, we observe an increasing amount of software with machine learning components being deployed. This poses the question of quality assurance for such components: how can we validate whether specified requirements are fulfilled by a machine learned software? Current testing and verification approaches either focus on a single requirement (e.g., fairness) or specialize on a single type of machine learning model (e.g., neural networks). In this paper, we propose property-driven testing of machine learning models. Our approach MLCheck encompasses (1) a language for property specification, and (2) a technique for systematic test case generation. The specification language is comparable to property-based testing languages. Test case generation employs advanced verification technology for a systematic, property-dependent construction of test suites, without additional user-supplied generator functions. We evaluate MLCheck using requirements and data sets from three different application areas (software discrimination, learning on knowledge graphs and security). Our evaluation shows that despite its generality MLCheck can even outperform specialised testing approaches while having a comparable runtime.

Software Engineering

Defining Smart Contract Defects on Ethereum

145 - Jiachi Chen , Xin Xia , David Lo 2019

Smart contracts are programs running on a blockchain. They are immutable to change, and hence can not be patched for bugs once deployed. Thus it is critical to ensure they are bug-free and well-designed before deployment. A Contract defect is an error, flaw or fault in a smart contract that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. The detection of contract defects is a method to avoid potential bugs and improve the design of existing code. Since smart contracts contain numerous distinctive features, such as the gas system. decentralized, it is important to find smart contract specified defects. To fill this gap, we collected smart-contract-related posts from Ethereum StackExchange, as well as real-world smart contracts. We manually analyzed these posts and contracts; using them to define 20 kinds of contract defects. We categorized them into indicating potential security, availability, performance, maintainability and reusability problems. To validate if practitioners consider these contract as harmful, we created an online survey and received 138 responses from 32 different countries. Feedback showed these contract defects are harmful and removing them would improve the quality and robustness of smart contracts. We manually identified our defined contract defects in 587 real world smart contract and publicly released our dataset. Finally, we summarized 5 impacts caused by contract defects. These help developers better understand the symptoms of the defects and removal priority.

Software Engineering

What Are We Really Testing in Mutation Testing for Machine Learning? A Critical Reflection

148 - Annibale Panichella , Cynthia C. S. Liem 2021

Mutation testing is a well-established technique for assessing a test suites quality by injecting artificial faults into production code. In recent years, mutation testing has been extended to machine learning (ML) systems, and deep learning (DL) in particular; researchers have proposed approaches, tools, and statistically sound heuristics to determine whether mutants in DL systems are killed or not. However, as we will argue in this work, questions can be raised to what extent currently used mutation testing techniques in DL are actually in line with the classical interpretation of mutation testing. We observe that ML model development resembles a test-driven development (TDD) process, in which a training algorithm (`programmer) generates a model (program) that fits the data points (test data) to labels (implicit assertions), up to a certain threshold. However, considering proposed mutation testing techniques for ML systems under this TDD metaphor, in current approaches, the distinction between production and test code is blurry, and the realism of mutation operators can be challenged. We also consider the fundamental hypotheses underlying classical mutation testing: the competent programmer hypothesis and coupling effect hypothesis. As we will illustrate, these hypotheses do not trivially translate to ML system development, and more conscious and explicit scoping and concept mapping will be needed to truly draw parallels. Based on our observations, we propose several action points for better alignment of mutation testing techniques for ML with paradigms and vocabularies of classical mutation testing.

Software Engineering

JEST: N+1-version Differential Testing of Both JavaScript Engines and Specification

78 - Jihyeok Park , Seungmin An , Dongjun Youn 2021

Modern programming follows the continuous integration (CI) and continuous deployment (CD) approach rather than the traditional waterfall model. Even the development of modern programming languages uses the CI/CD approach to swiftly provide new language features and to adapt to new development environments. Unlike in the conventional approach, in the modern CI/CD approach, a language specification is no more the oracle of the language semantics because both the specification and its implementations can co-evolve. In this setting, both the specification and implementations may have bugs, and guaranteeing their correctness is non-trivial. In this paper, we propose a novel N+1-version differential testing to resolve the problem. Unlike the traditional differential testing, our approach consists of three steps: 1) to automatically synthesize programs guided by the syntax and semantics from a given language specification, 2) to generate conformance tests by injecting assertions to the synthesized programs to check their final program states, 3) to detect bugs in the specification and implementations via executing the conformance tests on multiple implementations, and 4) to localize bugs on the specification using statistical information. We actualize our approach for the JavaScript programming language via JEST, which performs N+1-version differential testing for modern JavaScript engines and ECMAScript, the language specification describing the syntax and semantics of JavaScript in a natural language. We evaluated JEST with four JavaScript engines that support all modern JavaScript language features and the latest version of ECMAScript (ES11, 2020). JEST automatically synthesized 1,700 programs that covered 97.78% of syntax and 87.70% of semantics from ES11. Using the assertion-injection, it detected 44 engine bugs in four engines and 27 specification bugs in ES11.

Software Engineering