Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Learning to Update Natural Language Comments Based on Code Changes

429 0 0.0 ( 0 )

Download Cite

Added by Sheena Panthaplackel

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Sheena Panthaplackel - Pengyu Nie - Milos Gligoric

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We formulate the novel task of automatically updating an existing natural language comment based on changes in the body of code it accompanies. We propose an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications. We train and evaluate our model using a dataset that we collected from commit histories of open-source software projects, with each example consisting of a concurrent update to a method and its corresponding comment. We compare our approach against multiple baselines using both automatic metrics and human evaluation. Results reflect the challenge of this task and that our model outperforms baselines with respect to making edits.

rate research

Learning to Generate Code Comments from Class Hierarchies

130 - Jiyang Zhang , Sheena Panthaplackel , Pengyu Nie 2021

Descriptive code comments are essential for supporting code comprehension and maintenance. We propose the task of automatically generating comments for overriding methods. We formulate a novel framework which accommodates the unique contextual and linguistic reasoning that is required for performing this task. Our approach features: (1) incorporating context from the class hierarchy; (2) conditioning on learned, latent representations of specificity to generate comments that capture the more specialized behavior of the overriding method; and (3) unlikelihood training to discourage predictions which do not conform to invariant characteristics of the comment corresponding to the overridden method. Our experiments show that the proposed approach is able to generate comments for overriding methods of higher quality compared to prevailing comment generation techniques.

Computation and Language Machine Learning Software Engineering

Associating Natural Language Comment and Source Code Entities

442 - Sheena Panthaplackel , Milos Gligoric , Raymond J. Mooney 2019

Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.

Computation and Language Machine Learning Software Engineering

GANCoder: An Automatic Natural Language-to-Programming Language Translation Approach based on GAN

322 - Yabing Zhu , Yanfeng Zhang , Huili Yang 2019

We propose GANCoder, an automatic programming approach based on Generative Adversarial Networks (GAN), which can generate the same functional and logical programming language codes conditioned on the given natural language utterances. The adversarial training between generator and discriminator helps generator learn distribution of dataset and improve code generation quality. Our experimental results show that GANCoder can achieve comparable accuracy with the state-of-the-art methods and is more stable when programming languages.

Computation and Language Machine Learning

TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation

332 - Chun-Hsing Lin , Siang-Ruei Wu , Hung-Yi Lee 2020

Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems. This is mainly due to the non-differentiable nature of the discrete space sampling and thus these methods have to treat the discriminator as a black box and ignore the gradient information. To improve the sample efficiency and reduce the variance of REINFORCE, we propose a novel approach, TaylorGAN, which augments the gradient estimation by off-policy update and the first-order Taylor expansion. This approach enables us to train NLG models from scratch with smaller batch size -- without maximum likelihood pre-training, and outperforms existing GAN-based methods on multiple metrics of quality and diversity. The source code and data are available at https://github.com/MiuLab/TaylorGAN

Computation and Language Machine Learning

CoreGen: Contextualized Code Representation Learning for Commit Message Generation

272 - Lun Yiu Nie , Cuiyun Gao , Zhicong Zhong 2020

Automatic generation of high-quality commit messages for code commits can substantially facilitate software developers works and coordination. However, the semantic gap between source code and natural language poses a major challenge for the task. Several studies have been proposed to alleviate the challenge but none explicitly involves code contextual information during commit message generation. Specifically, existing research adopts static embedding for code tokens, which maps a token to the same vector regardless of its context. In this paper, we propose a novel Contextualized code representation learning strategy for commit message Generation (CoreGen). CoreGen first learns contextualized code representations which exploit the contextual information behind code commit sequences. The learned representations of code commits built upon Transformer are then fine-tuned for downstream commit message generation. Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with at least 28.18% improvement in terms of BLEU-4 score. Furthermore, we also highlight the future opportunities in training contextualized code representations on larger code corpus as a solution to low-resource tasks and adapting the contextualized code representation framework to other code-to-text generation tasks.

Computation and Language Machine Learning Software Engineering

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Learning to Update Natural Language Comments Based on Code Changes

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions