Toward Code Generation: A Survey and Lessons from Semantic Parsing

79 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Celine Lee

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Celine Lee - Justin Gottschlich (1

هندسة البرمجيات الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

With the growth of natural language processing techniques and demand for improved software engineering efficiency, there is an emerging interest in translating intention from human languages to programming languages. In this survey paper, we attempt to provide an overview of the growing body of research in this space. We begin by reviewing natural language semantic parsing techniques and draw parallels with program synthesis efforts. We then consider semantic parsing works from an evolutionary perspective, with specific analyses on neuro-symbolic methods, architecture, and supervision. We then analyze advancements in frameworks for semantic parsing for code generation. In closing, we present what we believe are some of the emerging open challenges in this domain.

قيم البحث

117 - Md Rizwan Parvez , Wasi Uddin Ahmad , Saikat Chakraborty 2021

Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting t hem. To mimic developers code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. REDCODER has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.

هندسة البرمجيات الحساب واللغة

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

80 - Shuai Lu , Daya Guo , Shuo Ren 2021

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE in cludes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

هندسة البرمجيات الحساب واللغة

A Survey on Semantic Parsing

71 - Aishwarya Kamath , Rajarshi Das 2018

A significant amount of information in todays world is stored in structured and semi-structured knowledge bases. Efficient and simple methods to query them are essential and must not be restricted to only those who have expertise in formal query lang uages. The field of semantic parsing deals with converting natural language utterances to logical forms that can be easily executed on a knowledge base. In this survey, we examine the various components of a semantic parsing system and discuss prominent work ranging from the initial rule based methods to the current neural approaches to program synthesis. We also discuss methods that operate using varying levels of supervision and highlight the key challenges involved in the learning of such systems.

الحساب واللغة

Context Dependent Semantic Parsing: A Survey

120 - Zhuang Li , Lizhen Qu , Gholamreza Haffari 2020

Semantic parsing is the task of translating natural language utterances into machine-readable meaning representations. Currently, most semantic parsing methods are not able to utilize contextual information (e.g. dialogue and comments history), which has a great potential to boost semantic parsing performance. To address this issue, context dependent semantic parsing has recently drawn a lot of attention. In this survey, we investigate progress on the methods for the context dependent semantic parsing, together with the current datasets and tasks. We then point out open problems and challenges for future research in this area. The collected resources for this topic are available at:https://github.com/zhuang-li/Contextual-Semantic-Parsing-Paper-List.

الحساب واللغة

On Learning Meaningful Code Changes via Neural Machine Translation

369 - Michele Tufano , Jevgenija Pantiuchina , Cody Watson 2019

Recent years have seen the rise of Deep Learning (DL) techniques applied to source code. Researchers have exploited DL to automate several development and maintenance tasks, such as writing commit messages, generating comments and detecting vulnerabi lities among others. One of the long lasting dreams of applying DL to source code is the possibility to automate non-trivial coding activities. While some steps in this direction have been taken (e.g., learning how to fix bugs), there is still a glaring lack of empirical evidence on the types of code changes that can be learned and automatically applied by DL. Our goal is to make this first important step by quantitatively and qualitatively investigating the ability of a Neural Machine Translation (NMT) model to learn how to automatically apply code changes implemented by developers during pull requests. We train and experiment with the NMT model on a set of 236k pairs of code components before and after the implementation of the changes provided in the pull requests. We show that, when applied in a narrow enough context (i.e., small/medium-sized pairs of methods before/after the pull request changes), NMT can automatically replicate the changes implemented by developers during pull requests in up to 36% of the cases. Moreover, our qualitative analysis shows that the model is capable of learning and replicating a wide variety of meaningful code changes, especially refactorings and bug-fixing activities. Our results pave the way for novel research in the area of DL on code, such as the automatic learning and applications of refactoring.

هندسة البرمجيات الحساب واللغة التعلم الآلي