SmartEmbed: A Tool for Clone and Bug Detection in Smart Contracts through Structural Code Embedding

96 0 0.0 ( 0 )

Download Cite

Added by Zhipeng Gao

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Zhipeng Gao - Vinoj Jayasundara - Lingxiao Jiang

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Ethereum has become a widely used platform to enable secure, Blockchain-based financial and business transactions. However, a major concern in Ethereum is the security of its smart contracts. Many identified bugs and vulnerabilities in smart contracts not only present challenges to maintenance of blockchain, but also lead to serious financial loses. There is a significant need to better assist developers in checking smart contracts and ensuring their reliability.In this paper, we propose a web service tool, named SmartEmbed, which can help Solidity developers to find repetitive contract code and clone-related bugs in smart contracts. Our tool is based on code embeddings and similarity checking techniques. By comparing the similarities among the code embedding vectors for existing solidity code in the Ethereum blockchain and known bugs, we are able to efficiently identify code clones and clone-related bugs for any solidity code given by users, which can help to improve the users confidence in the reliability of their code. In addition to the uses by individual developers, SmartEmbed can also be applied to studies of smart contracts in a large scale. When applied to more than 22K solidity contracts collected from the Ethereum blockchain, we found that the clone ratio of solidity code is close to 90%, much higher than traditional software, and 194 clone-related bugs can be identified efficiently and accurately based on our small bug database with a precision of 96%. SmartEmbed can be accessed at url{http://www.smartembed.net}. A demo video of SmartEmbed is at url{https://youtu.be/o9ylyOpYFq8}

rate research

Checking Smart Contracts with Structural Code Embedding

88 - Zhipeng Gao , Lingxiao Jiang , Xin Xia 2020

Smart contracts have been increasingly used together with blockchains to automate financial and business transactions. However, many bugs and vulnerabilities have been identified in many contracts which raises serious concerns about smart contract security, not to mention that the blockchain systems on which the smart contracts are built can be buggy. Thus, there is a significant need to better maintain smart contract code and ensure its high reliability. In this paper, we propose an automated approach to learn characteristics of smart contracts in Solidity, which is useful for clone detection, bug detection and contract validation on smart contracts. Our new approach is based on word embeddings and vector space comparison. We parse smart contract code into word streams with code structural information, convert code elements (e.g., statements, functions) into numerical vectors that are supposed to encode the code syntax and semantics, and compare the similarities among the vectors encoding code and known bugs, to identify potential issues. We have implemented the approach in a prototype, named SmartEmbed. Results show that our tool can effectively identify many repetitive instances of Solidity code, where the clone ratio is around 90%. Code clones such as type-III or even type-IV semantic clones can also be detected accurately. Our tool can identify more than 1000 clone related bugs based on our bug databases efficiently and accurately. Our tool can also help to efficiently validate any given smart contract against a known set of bugs, which can help to improve the users confidence in the reliability of the contract. The anonymous replication packages can be accessed at: https://drive.google.com/file/d/1kauLT3y2IiHPkUlVx4FSTda-dVAyL4za/view?usp=sharing, and evaluated it with more than 22,000 smart contracts collected from the Ethereum blockchain.

Software Engineering

SourcererCC: Scaling Code Clone Detection to Big Code

108 - Hitesh Sajnani , Vaibhav Saini , Jeffrey Svajlenko 2015

Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.

Software Engineering

ContractGuard: Defend Ethereum Smart Contracts with Embedded Intrusion Detection

96 - Xinming Wang , Jiahao He , Zhijian Xie 2019

Ethereum smart contracts are programs that can be collectively executed by a network of mutually untrusted nodes. Smart contracts handle and transfer assets of values, offering strong incentives for malicious attacks. Intrusion attacks are a popular type of malicious attacks. In this paper, we propose ContractGuard, the first intrusion detection system (IDS) to defend Ethereum smart contracts against such attacks. Like IDSs for conventional programs, ContractGuard detects intrusion attempts as abnormal control flow. However, existing IDS techniques/tools are inapplicable to Ethereum smart contracts due to Ethereums decentralized nature and its highly restrictive execution environment. To address these issues, we design ContractGuard by embedding it in the contracts to profile context-tagged acyclic paths, and optimizing it under the Ethereum gas-oriented performance model. The main goal is to minimize the overheads, to which the users will be extremely sensitive since the cost needs to be paid upfront in digital concurrency. Empirical investigation using real-life contracts deployed in the Ethereum mainnet shows that on average, ContractGuard only adds to 36.14% of the deployment overhead and 28.27% of the runtime overhead. Furthermore, we conducted controlled experiments and show that ContractGuard successfully guard against attacks on all real-world vulnerabilities and 83% of the seeded vulnerabilities.

Software Engineering Networking and Internet Architecture

Doublade: Unknown Vulnerability Detection in Smart Contracts Via Abstract Signature Matching and Refined Detection Rules

88 - Yinxing Xue 2019

With the prosperity of smart contracts and the blockchain technology, various security analyzers have been proposed from both the academia and industry to address the associated risks. Yet, there does not exist a high-quality benchmark of smart contract vulnerability for security research. In this study, we propose an approach towards building a high-quality vulnerability benchmark. Our approach consists of two parts. First, to improve recall, we propose to search for similar vulnerabilities in an automated way by leveraging the abstract vulnerability signature (AVS). Second, to remove the false positives (FPs) due to AVS-based matching, we summarize the detection rules of existing tools and apply the refined rules by considering various defense mechanisms (DMs). By integrating AVS-based code matching and the refined detection rules (RDR), our approach achieves higher precision and recall. On the collected 76,354 contracts, we build a benchmark consisting of 1,219 vulnerabilities covering five different vulnerability types identified together by our tool (DOUBLADE) and other three scanners. Additionally, we conduct a comparison between DOUBLADE and the others, on an additional 17,770 contracts. Results show that DOUBLADE can yield a better detection accuracy with similar execution time.

Software Engineering Cryptography and Security Distributed Parallel and Cluster Computing

When Deep Learning Meets Smart Contracts

124 - Zhipeng Gao 2020

Ethereum has become a widely used platform to enable secure, Blockchain-based financial and business transactions. However, many identified bugs and vulnerabilities in smart contracts have led to serious financial losses, which raises serious concerns about smart contract security. Thus, there is a significant need to better maintain smart contract code and ensure its high reliability. In this research: (1) Firstly, we propose an automated deep learning based approach to learn structural code embeddings of smart contracts in Solidity, which is useful for clone detection, bug detection and contract validation on smart contracts. We apply our approach to more than 22K solidity contracts collected from the Ethereum blockchain, results show that the clone ratio of solidity code is at around 90%, much higher than traditional software. We collect a list of 52 known buggy smart contracts belonging to 10 kinds of common vulnerabilities as our bug database. Our approach can identify more than 1000 clone related bugs based on our bug databases efficiently and accurately. (2) Secondly, according to developers feedback, we have implemented the approach in a web-based tool, named SmartEmbed, to facilitate Solidity developers for using our approach. Our tool can assist Solidity developers to efficiently identify repetitive smart contracts in the existing Ethereum blockchain, as well as checking their contract against a known set of bugs, which can help to improve the users confidence in the reliability of the contract. We optimize the implementations of SmartEmbed which is sufficient in supporting developers in real-time for practical uses. The Ethereum ecosystem as well as the individual Solidity developer can both benefit from our research.

Software Engineering