DeepMerge: Learning to Merge Programs

182 0 0.0 ( 0 )

Download Cite

Added by Elizabeth Dinella

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Elizabeth Dinella - Todd Mytkowicz - Alexey Svyatkovskiy

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In collaborative software development, program merging is the mechanism to integrate changes from multiple programmers. Merge algorithms in modern version control systems report a conflict when changes interfere textually. Merge conflicts require manual intervention and frequently stall modern continuous integration pipelines. Prior work found that, although costly, a large majority of resolutions involve re-arranging text without writing any new code. Inspired by this observation we propose the first data-driven approach to resolve merge conflicts with a machine learning model. We realize our approach in a tool DeepMerge that uses a novel combination of (i) an edit-aware embedding of merge inputs and (ii) a variation of pointer networks, to construct resolutions from input segments. We also propose an algorithm to localize manual resolutions in a resolved file and employ it to curate a ground-truth dataset comprising 8,719 non-trivial resolutions in JavaScript programs. Our evaluation shows that, on a held out test set, DeepMerge can predict correct resolutions for 37% of non-trivial merges, compared to only 4% by a state-of-the-art semistructured merge technique. Furthermore, on the subset of merges with upto 3 lines (comprising 24% of the total dataset), DeepMerge can predict correct resolutions with 78% accuracy.

rate research

Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis

328 - Rangeet Pan , Vu Le , Nachiappan Nagappan 2021

Forking structure is widespread in the open-source repositories and that causes a significant number of merge conflicts. In this paper, we study the problem of textual merge conflicts from the perspective of Microsoft Edge, a large, highly collaborative fork off the main Chromium branch with significant merge conflicts. Broadly, this study is divided into two sections. First, we empirically evaluate textual merge conflicts in Microsoft Edge and classify them based on the type of files, location of conflicts in a file, and the size of conflicts. We found that ~28% of the merge conflicts are 1-2 line changes, and many resolutions have frequent patterns. Second, driven by these findings, we explore Program Synthesis (for the first time) to learn patterns and resolve structural merge conflicts. We propose a novel domain-specific language (DSL) that captures many of the repetitive merge conflict resolution patterns and learn resolution strategies as programs in this DSL from example resolutions. We found that the learned strategies can resolve 11.4% of the conflicts (~41% of 1-2 line changes) that arise in the C++ files with 93.2% accuracy.

Software Engineering

On Testing Constraint Programs

426 - Nadjib Lazaar , Lebbahn Yahia 2010

The success of several constraint-based modeling languages such as OPL, ZINC, or COMET, appeals for better software engineering practices, particularly in the testing phase. This paper introduces a testing framework enabling automated test case generation for constraint programming. We propose a general framework of constraint program development which supposes that a first declarative and simple constraint model is available from the problem specifications analysis. Then, this model is refined using classical techniques such as constraint reformulation, surrogate and global constraint addition, or symmetry-breaking to form an improved constraint model that must be thoroughly tested before being used to address real-sized problems. We think that most of the faults are introduced in this refinement step and propose a process which takes the first declarative model as an oracle for detecting non-conformities. We derive practical test purposes from this process to generate automatically test data that exhibit non-conformities. We implemented this approach in a new tool called CPTEST that was used to automatically detect non-conformities on two classical benchmark programs, namely the Golomb rulers and the car-sequencing problem.

Software Engineering

Learning to Optimize Tensor Programs

77 - Tianqi Chen , Lianmin Zheng , Eddie Yan 2018

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems. However, existing systems rely on manually optimized libraries such as cuDNN where only a narrow range of server class GPUs are well-supported. The reliance on hardware-specific operator libraries limits the applicability of high-level graph optimizations and incurs significant engineering costs when deploying to new hardware targets. We use learning to remove this engineering burden. We learn domain-specific statistical cost models to guide the search of tensor operator implementations over billions of possible program variants. We further accelerate the search by effective model transfer across workloads. Experimental results show that our framework delivers performance competitive with state-of-the-art hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPU.

Machine Learning Machine Learning

Automatic Detection and Resolution of Software Merge Conflicts: Are We There Yet?

483 - Bowen Shen 2021

Developers create software branches for tentative feature addition and bug fixing, and periodically merge branches to release software with new features or repairing patches. When the program edits from different branches textually overlap (i.e., textual conflicts), or the co-application of those edits lead to compilation or runtime errors (i.e., compiling or dynamic conflicts), it is challenging and time-consuming for developers to eliminate merge conflicts. Prior studies examined %the popularity of merge conflicts and how conflicts were related to code smells or software development process; tools were built to find and solve conflicts. However, some fundamental research questions are still not comprehensively explored, including (1) how conflicts were introduced, (2) how developers manually resolved conflicts, and (3) what conflicts cannot be handled by current tools. For this paper, we took a hybrid approach that combines automatic detection with manual inspection to reveal 204 merge conflicts and their resolutions in 15 open-source repositories. %in the version history of 15 open-source projects. Our data analysis reveals three phenomena. First, compiling and dynamic conflicts are harder to detect, although current tools mainly focus on textual conflicts. Second, in the same merging context, developers usually resolved similar textual conflicts with similar strategies. Third, developers manually fixed most of the inspected compiling and dynamic conflicts by similarly editing the merged version as what they did for one of the branches. Our research reveals the challenges and opportunities for automatic detection and resolution of merge conflicts; it also sheds light on related areas like systematic program editing and change recommendation.

Software Engineering

Identifying Bug Patterns in Quantum Programs

164 - Pengzhan Zhao , Jianjun Zhao , Lei Ma 2021

Bug patterns are erroneous code idioms or bad coding practices that have been proved to fail time and time again, which are usually caused by the misunderstanding of a programming languages features, the use of erroneous design patterns, or simple mistakes sharing common behaviors. This paper identifies and categorizes some bug patterns in the quantum programming language Qiskit and briefly discusses how to eliminate or prevent those bug patterns. We take this research as the first step to provide an underlying basis for debugging and testing quantum programs.

Software Engineering Programming Languages Quantum Physics