The standard coder: a machine learning approach to measuring the effort required to produce source code change

54 0 0.0 ( 0 )

Download Cite

Added by Ian Wright

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Ian Wright - Albert Ziegler

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We apply machine learning to version control data to measure the quantity of effort required to produce source code changes. We construct a model of a `standard coder trained from examples of code changes produced by actual software developers together with the labor time they supplied. The effort of a code change is then defined as the labor hours supplied by the standard coder to produce that change. We therefore reduce heterogeneous, structured code changes to a scalar measure of effort derived from large quantities of empirical data on the coding behavior of software developers. The standard coder replaces traditional metrics, such as lines-of-code or function point analysis, and yields new insights into what code changes require more or less effort.

rate research

Learning to map source code to software vulnerability using code-as-a-graph

133 - Sahil Suneja , Yunhui Zheng , Yufan Zhuang 2020

We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of relationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct 2019, Paper #28, ICST)

Software Engineering Cryptography and Security Machine Learning

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

270 - Yao Wan , Yang He , Jian-Guo Zhang 2020

We present NaturalCC, an efficient and extensible toolkit to bridge the gap between natural language and programming language, and facilitate the research on big code analysis. Using NaturalCC, researchers both from natural language or programming language communities can quickly and easily reproduce the state-of-the-art baselines and implement their approach. NaturalCC is built upon Fairseq and PyTorch, providing (1) an efficient computation with multi-GPU and mixed-precision data processing for fast model training, (2) a modular and extensible framework that makes it easy to reproduce or implement an approach for big code analysis, and (3) a command line interface and a graphical user interface to demonstrate each models performance. Currently, we have included several state-of-the-art baselines across different tasks (e.g., code completion, code comment generation, and code retrieval) for demonstration. The video of this demo is available at https://www.youtube.com/watch?v=q4W5VSI-u3E&t=25s.

Software Engineering

Learning How to Mutate Source Code from Bug-Fixes

146 - Michele Tufano , Cody Watson , Gabriele Bavota 2018

Mutation testing has been widely accepted as an approach to guide test case generation or to assess the effectiveness of test suites. Empirical studies have shown that mutants are representative of real faults; yet they also indicated a clear need for better, possibly customized, mutation operators and strategies. While methods to devise domain-specific or general-purpose mutation operators from real faults exist, they are effort- and error-prone, and do not help the tester to decide whether and how to mutate a given source code element. We propose a novel approach to automatically learn mutants from faults in real programs. First, our approach processes bug fixing changes using fine-grained differencing, code abstraction, and change clustering. Then, it learns mutation models using a deep learning strategy. We have trained and evaluated our technique on a set of ~787k bug fixes mined from GitHub. Our empirical evaluation showed that our models are able to predict mutants that resemble the actual fixed bugs in between 9% and 45% of the cases, and over 98% of the automatically generated mutants are lexically and syntactically correct.

Software Engineering

A Better Approach to Track the Evolution of Static Code Warnings

92 - Junjie Li 2021

Static bug detection tools help developers detect code problems. However, it is known that they remain underutilized due to various reasons. Recent advances to incorporate static bug detectors in modern software development workflows can better motivate developers to fix the reported warnings on the fly. In this paper, we study the effectiveness of the state-of-the-art (SOA) solution in tracking warnings by static bug detectors and propose a better solution based on our analysis of the insufficiencies of the SOA solution. In particular, we examined four large-scale open-source systems and crafted a data set of 3,452 static code warnings by two static bug detectors. We manually uncover the ground-truth evolution status of the selected warnings: persistent, resolved, or newly-introduced. Moreover, upon manual analysis, we identified the critical reasons behind the insufficiencies of the SOA matching algorithm. Finally, we propose a better approach to improve the tracking of static warnings over software development history. Our evaluation shows that our proposed approach provides a significant improvement in the precision of the tracking, i.e., from 66.9% to 90.0%.

Software Engineering

Hybrid Approach to Automation, RPA and Machine Learning: a Method for the Human-centered Design of Software Robots

103 - Wies{l}aw Kopec , Marcin Skibinski , Cezary Biele 2018

One of the more prominent trends within Industry 4.0 is the drive to employ Robotic Process Automation (RPA), especially as one of the elements of the Lean approach. The full implementation of RPA is riddled with challenges relating both to the reality of everyday business operations, from SMEs to SSCs and beyond, and the social effects of the changing job market. To successfully address these points there is a need to develop a solution that would adjust to the existing business operations and at the same time lower the negative social impact of the automation process. To achieve these goals we propose a hybrid, human-centered approach to the development of software robots. This design and implementation method combines the Living Lab approach with empowerment through participatory design to kick-start the co-development and co-maintenance of hybrid software robots which, supported by variety of AI methods and tools, including interactive and collaborative ML in the cloud, transform menial job posts into higher-skilled positions, allowing former employees to stay on as robot co-designers and maintainers, i.e. as co-programmers who supervise the machine learning processes with the use of tailored high-level RPA Domain Specific Languages (DSLs) to adjust the functioning of the robots and maintain operational flexibility.

Software Engineering Computers and Society Machine Learning