Checking Smart Contracts with Structural Code Embedding

89 0 0.0 ( 0 )

Download Cite

Added by Zhipeng Gao

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Zhipeng Gao - Lingxiao Jiang - Xin Xia

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Smart contracts have been increasingly used together with blockchains to automate financial and business transactions. However, many bugs and vulnerabilities have been identified in many contracts which raises serious concerns about smart contract security, not to mention that the blockchain systems on which the smart contracts are built can be buggy. Thus, there is a significant need to better maintain smart contract code and ensure its high reliability. In this paper, we propose an automated approach to learn characteristics of smart contracts in Solidity, which is useful for clone detection, bug detection and contract validation on smart contracts. Our new approach is based on word embeddings and vector space comparison. We parse smart contract code into word streams with code structural information, convert code elements (e.g., statements, functions) into numerical vectors that are supposed to encode the code syntax and semantics, and compare the similarities among the vectors encoding code and known bugs, to identify potential issues. We have implemented the approach in a prototype, named SmartEmbed. Results show that our tool can effectively identify many repetitive instances of Solidity code, where the clone ratio is around 90%. Code clones such as type-III or even type-IV semantic clones can also be detected accurately. Our tool can identify more than 1000 clone related bugs based on our bug databases efficiently and accurately. Our tool can also help to efficiently validate any given smart contract against a known set of bugs, which can help to improve the users confidence in the reliability of the contract. The anonymous replication packages can be accessed at: https://drive.google.com/file/d/1kauLT3y2IiHPkUlVx4FSTda-dVAyL4za/view?usp=sharing, and evaluated it with more than 22,000 smart contracts collected from the Ethereum blockchain.

rate research

SmartEmbed: A Tool for Clone and Bug Detection in Smart Contracts through Structural Code Embedding

95 - Zhipeng Gao , Vinoj Jayasundara , Lingxiao Jiang 2019

Ethereum has become a widely used platform to enable secure, Blockchain-based financial and business transactions. However, a major concern in Ethereum is the security of its smart contracts. Many identified bugs and vulnerabilities in smart contracts not only present challenges to maintenance of blockchain, but also lead to serious financial loses. There is a significant need to better assist developers in checking smart contracts and ensuring their reliability.In this paper, we propose a web service tool, named SmartEmbed, which can help Solidity developers to find repetitive contract code and clone-related bugs in smart contracts. Our tool is based on code embeddings and similarity checking techniques. By comparing the similarities among the code embedding vectors for existing solidity code in the Ethereum blockchain and known bugs, we are able to efficiently identify code clones and clone-related bugs for any solidity code given by users, which can help to improve the users confidence in the reliability of their code. In addition to the uses by individual developers, SmartEmbed can also be applied to studies of smart contracts in a large scale. When applied to more than 22K solidity contracts collected from the Ethereum blockchain, we found that the clone ratio of solidity code is close to 90%, much higher than traditional software, and 194 clone-related bugs can be identified efficiently and accurately based on our small bug database with a precision of 96%. SmartEmbed can be accessed at url{http://www.smartembed.net}. A demo video of SmartEmbed is at url{https://youtu.be/o9ylyOpYFq8}

Software Engineering

ContractGuard: Defend Ethereum Smart Contracts with Embedded Intrusion Detection

96 - Xinming Wang , Jiahao He , Zhijian Xie 2019

Ethereum smart contracts are programs that can be collectively executed by a network of mutually untrusted nodes. Smart contracts handle and transfer assets of values, offering strong incentives for malicious attacks. Intrusion attacks are a popular type of malicious attacks. In this paper, we propose ContractGuard, the first intrusion detection system (IDS) to defend Ethereum smart contracts against such attacks. Like IDSs for conventional programs, ContractGuard detects intrusion attempts as abnormal control flow. However, existing IDS techniques/tools are inapplicable to Ethereum smart contracts due to Ethereums decentralized nature and its highly restrictive execution environment. To address these issues, we design ContractGuard by embedding it in the contracts to profile context-tagged acyclic paths, and optimizing it under the Ethereum gas-oriented performance model. The main goal is to minimize the overheads, to which the users will be extremely sensitive since the cost needs to be paid upfront in digital concurrency. Empirical investigation using real-life contracts deployed in the Ethereum mainnet shows that on average, ContractGuard only adds to 36.14% of the deployment overhead and 28.27% of the runtime overhead. Furthermore, we conducted controlled experiments and show that ContractGuard successfully guard against attacks on all real-world vulnerabilities and 83% of the seeded vulnerabilities.

Software Engineering Networking and Internet Architecture

When Deep Learning Meets Smart Contracts

124 - Zhipeng Gao 2020

Ethereum has become a widely used platform to enable secure, Blockchain-based financial and business transactions. However, many identified bugs and vulnerabilities in smart contracts have led to serious financial losses, which raises serious concerns about smart contract security. Thus, there is a significant need to better maintain smart contract code and ensure its high reliability. In this research: (1) Firstly, we propose an automated deep learning based approach to learn structural code embeddings of smart contracts in Solidity, which is useful for clone detection, bug detection and contract validation on smart contracts. We apply our approach to more than 22K solidity contracts collected from the Ethereum blockchain, results show that the clone ratio of solidity code is at around 90%, much higher than traditional software. We collect a list of 52 known buggy smart contracts belonging to 10 kinds of common vulnerabilities as our bug database. Our approach can identify more than 1000 clone related bugs based on our bug databases efficiently and accurately. (2) Secondly, according to developers feedback, we have implemented the approach in a web-based tool, named SmartEmbed, to facilitate Solidity developers for using our approach. Our tool can assist Solidity developers to efficiently identify repetitive smart contracts in the existing Ethereum blockchain, as well as checking their contract against a known set of bugs, which can help to improve the users confidence in the reliability of the contract. We optimize the implementations of SmartEmbed which is sufficient in supporting developers in real-time for practical uses. The Ethereum ecosystem as well as the individual Solidity developer can both benefit from our research.

Software Engineering

Embedding API Dependency Graph for Neural Code Generation

159 - Chen Lyu , Ruyun Wang , Hongyu Zhang 2021

The problem of code generation from textual program descriptions has long been viewed as a grand challenge in software engineering. In recent years, many deep learning based approaches have been proposed, which can generate a sequence of code from a sequence of textual program description. However, the existing approaches ignore the global relationships among API methods, which are important for understanding the usage of APIs. In this paper, we propose to model the dependencies among API methods as an API dependency graph (ADG) and incorporate the graph embedding into a sequence-to-sequence (Seq2Seq) model. In addition to the existing encoder-decoder structure, a new module named ``embedder is introduced. In this way, the decoder can utilize both global structural dependencies and textual program description to predict the target code. We conduct extensive code generation experiments on three public datasets and in two programming languages (Python and Java). Our proposed approach, called ADG-Seq2Seq, yields significant improvements over existing state-of-the-art methods and maintains its performance as the length of the target code increases. Extensive ablation tests show that the proposed ADG embedding is effective and outperforms the baselines.

Software Engineering Artificial Intelligence

Towards French Smart Building Code: Compliance Checking Based on Semantic Rules

436 - Nicolas Bus 2019

Manually checking models for compliance against building regulation is a time-consuming task for architects and construction engineers. There is thus a need for algorithms that process information from construction projects and report non-compliant elements. Still automated code-compliance checking raises several obstacles. Building regulations are usually published as human readable texts and their content is often ambiguous or incomplete. Also, the vocabulary used for expressing such regulations is very different from the vocabularies used to express Building Information Models (BIM). Furthermore, the high level of details associated to BIM-contained geometries induces complex calculations. Finally, the level of complexity of the IFC standard also hinders the automation of IFC processing tasks. Model chart, formal rules and pre-processors approach allows translating construction regulations into semantic queries. We further demonstrate the usefulness of this approach through several use cases. We argue our approach is a step forward in bridging the gap between regulation texts and automated checking algorithms. Finally with the recent building ontology BOT recommended by the W3C Linked Building Data Community Group, we identify perspectives for standardizing and extending our approach.

Artificial Intelligence Computation and Language Logic in Computer Science