Learning to Update Natural Language Comments Based on Code Changes

429 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sheena Panthaplackel

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sheena Panthaplackel - Pengyu Nie - Milos Gligoric

الحساب واللغة التعلم الآلي هندسة البرمجيات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We formulate the novel task of automatically updating an existing natural language comment based on changes in the body of code it accompanies. We propose an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications. We train and evaluate our model using a dataset that we collected from commit histories of open-source software projects, with each example consisting of a concurrent update to a method and its corresponding comment. We compare our approach against multiple baselines using both automatic metrics and human evaluation. Results reflect the challenge of this task and that our model outperforms baselines with respect to making edits.

قيم البحث

130 - Jiyang Zhang , Sheena Panthaplackel , Pengyu Nie 2021

Descriptive code comments are essential for supporting code comprehension and maintenance. We propose the task of automatically generating comments for overriding methods. We formulate a novel framework which accommodates the unique contextual and li nguistic reasoning that is required for performing this task. Our approach features: (1) incorporating context from the class hierarchy; (2) conditioning on learned, latent representations of specificity to generate comments that capture the more specialized behavior of the overriding method; and (3) unlikelihood training to discourage predictions which do not conform to invariant characteristics of the comment corresponding to the overridden method. Our experiments show that the proposed approach is able to generate comments for overriding methods of higher quality compared to prevailing comment generation techniques.

الحساب واللغة التعلم الآلي هندسة البرمجيات

Associating Natural Language Comment and Source Code Entities

442 - Sheena Panthaplackel , Milos Gligoric , Raymond J. Mooney 2019

Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.

الحساب واللغة التعلم الآلي هندسة البرمجيات

GANCoder: An Automatic Natural Language-to-Programming Language Translation Approach based on GAN

322 - Yabing Zhu , Yanfeng Zhang , Huili Yang 2019

We propose GANCoder, an automatic programming approach based on Generative Adversarial Networks (GAN), which can generate the same functional and logical programming language codes conditioned on the given natural language utterances. The adversarial training between generator and discriminator helps generator learn distribution of dataset and improve code generation quality. Our experimental results show that GANCoder can achieve comparable accuracy with the state-of-the-art methods and is more stable when programming languages.

الحساب واللغة التعلم الآلي

TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation

332 - Chun-Hsing Lin , Siang-Ruei Wu , Hung-Yi Lee 2020

Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems. This is mainly due to the non-differentiable nature of the discrete space sampling a nd thus these methods have to treat the discriminator as a black box and ignore the gradient information. To improve the sample efficiency and reduce the variance of REINFORCE, we propose a novel approach, TaylorGAN, which augments the gradient estimation by off-policy update and the first-order Taylor expansion. This approach enables us to train NLG models from scratch with smaller batch size -- without maximum likelihood pre-training, and outperforms existing GAN-based methods on multiple metrics of quality and diversity. The source code and data are available at https://github.com/MiuLab/TaylorGAN

الحساب واللغة التعلم الآلي

CoreGen: Contextualized Code Representation Learning for Commit Message Generation

272 - Lun Yiu Nie , Cuiyun Gao , Zhicong Zhong 2020

Automatic generation of high-quality commit messages for code commits can substantially facilitate software developers works and coordination. However, the semantic gap between source code and natural language poses a major challenge for the task. Se veral studies have been proposed to alleviate the challenge but none explicitly involves code contextual information during commit message generation. Specifically, existing research adopts static embedding for code tokens, which maps a token to the same vector regardless of its context. In this paper, we propose a novel Contextualized code representation learning strategy for commit message Generation (CoreGen). CoreGen first learns contextualized code representations which exploit the contextual information behind code commit sequences. The learned representations of code commits built upon Transformer are then fine-tuned for downstream commit message generation. Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with at least 28.18% improvement in terms of BLEU-4 score. Furthermore, we also highlight the future opportunities in training contextualized code representations on larger code corpus as a solution to low-resource tasks and adapting the contextualized code representation framework to other code-to-text generation tasks.

الحساب واللغة التعلم الآلي هندسة البرمجيات