The standard coder: a machine learning approach to measuring the effort required to produce source code change

54 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ian Wright

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ian Wright - Albert Ziegler

هندسة البرمجيات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We apply machine learning to version control data to measure the quantity of effort required to produce source code changes. We construct a model of a `standard coder trained from examples of code changes produced by actual software developers together with the labor time they supplied. The effort of a code change is then defined as the labor hours supplied by the standard coder to produce that change. We therefore reduce heterogeneous, structured code changes to a scalar measure of effort derived from large quantities of empirical data on the coding behavior of software developers. The standard coder replaces traditional metrics, such as lines-of-code or function point analysis, and yields new insights into what code changes require more or less effort.

قيم البحث

133 - Sahil Suneja , Yunhui Zheng , Yufan Zhuang 2020

We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of rel ationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct 2019, Paper #28, ICST)

هندسة البرمجيات التشفير والأمن التعلم الآلي

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

270 - Yao Wan , Yang He , Jian-Guo Zhang 2020

We present NaturalCC, an efficient and extensible toolkit to bridge the gap between natural language and programming language, and facilitate the research on big code analysis. Using NaturalCC, researchers both from natural language or programming la nguage communities can quickly and easily reproduce the state-of-the-art baselines and implement their approach. NaturalCC is built upon Fairseq and PyTorch, providing (1) an efficient computation with multi-GPU and mixed-precision data processing for fast model training, (2) a modular and extensible framework that makes it easy to reproduce or implement an approach for big code analysis, and (3) a command line interface and a graphical user interface to demonstrate each models performance. Currently, we have included several state-of-the-art baselines across different tasks (e.g., code completion, code comment generation, and code retrieval) for demonstration. The video of this demo is available at https://www.youtube.com/watch?v=q4W5VSI-u3E&t=25s.

هندسة البرمجيات

Learning How to Mutate Source Code from Bug-Fixes

146 - Michele Tufano , Cody Watson , Gabriele Bavota 2018

Mutation testing has been widely accepted as an approach to guide test case generation or to assess the effectiveness of test suites. Empirical studies have shown that mutants are representative of real faults; yet they also indicated a clear need fo r better, possibly customized, mutation operators and strategies. While methods to devise domain-specific or general-purpose mutation operators from real faults exist, they are effort- and error-prone, and do not help the tester to decide whether and how to mutate a given source code element. We propose a novel approach to automatically learn mutants from faults in real programs. First, our approach processes bug fixing changes using fine-grained differencing, code abstraction, and change clustering. Then, it learns mutation models using a deep learning strategy. We have trained and evaluated our technique on a set of ~787k bug fixes mined from GitHub. Our empirical evaluation showed that our models are able to predict mutants that resemble the actual fixed bugs in between 9% and 45% of the cases, and over 98% of the automatically generated mutants are lexically and syntactically correct.

هندسة البرمجيات

A Better Approach to Track the Evolution of Static Code Warnings

92 - Junjie Li 2021

Static bug detection tools help developers detect code problems. However, it is known that they remain underutilized due to various reasons. Recent advances to incorporate static bug detectors in modern software development workflows can better motiv ate developers to fix the reported warnings on the fly. In this paper, we study the effectiveness of the state-of-the-art (SOA) solution in tracking warnings by static bug detectors and propose a better solution based on our analysis of the insufficiencies of the SOA solution. In particular, we examined four large-scale open-source systems and crafted a data set of 3,452 static code warnings by two static bug detectors. We manually uncover the ground-truth evolution status of the selected warnings: persistent, resolved, or newly-introduced. Moreover, upon manual analysis, we identified the critical reasons behind the insufficiencies of the SOA matching algorithm. Finally, we propose a better approach to improve the tracking of static warnings over software development history. Our evaluation shows that our proposed approach provides a significant improvement in the precision of the tracking, i.e., from 66.9% to 90.0%.

هندسة البرمجيات

Hybrid Approach to Automation, RPA and Machine Learning: a Method for the Human-centered Design of Software Robots

103 - Wies{l}aw Kopec , Marcin Skibinski , Cezary Biele 2018

One of the more prominent trends within Industry 4.0 is the drive to employ Robotic Process Automation (RPA), especially as one of the elements of the Lean approach. The full implementation of RPA is riddled with challenges relating both to the reali ty of everyday business operations, from SMEs to SSCs and beyond, and the social effects of the changing job market. To successfully address these points there is a need to develop a solution that would adjust to the existing business operations and at the same time lower the negative social impact of the automation process. To achieve these goals we propose a hybrid, human-centered approach to the development of software robots. This design and implementation method combines the Living Lab approach with empowerment through participatory design to kick-start the co-development and co-maintenance of hybrid software robots which, supported by variety of AI methods and tools, including interactive and collaborative ML in the cloud, transform menial job posts into higher-skilled positions, allowing former employees to stay on as robot co-designers and maintainers, i.e. as co-programmers who supervise the machine learning processes with the use of tailored high-level RPA Domain Specific Languages (DSLs) to adjust the functioning of the robots and maintain operational flexibility.

هندسة البرمجيات أجهزة الكمبيوتر والمجتمع التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة طرطوس

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The standard coder: a machine learning approach to measuring the effort required to produce source code change

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً