Deep Just-In-Time Inconsistency Detection Between Comments and Source Code

78 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sheena Panthaplackel

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sheena Panthaplackel - Junyi Jessy Li - Milos Gligoric

هندسة البرمجيات الذكاء الاصطناعي الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Natural language comments convey key aspects of source code such as implementation, usage, and pre- and post-conditions. Failure to update comments accordingly when the corresponding code is modified introduces inconsistencies, which is known to lead to confusion and software bugs. In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code, in order to catch potential inconsistencies just-in-time, i.e., before they are committed to a code base. To achieve this, we develop a deep-learning approach that learns to correlate a comment with code changes. By evaluating on a large corpus of comment/code pairs spanning various comment types, we show that our model outperforms multiple baselines by significant margins. For extrinsic evaluation, we show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system which can both detect and resolve inconsistent comments based on code changes.

قيم البحث

80 - Yixiao Yang 2020

In the field of software engineering, applying language models to the token sequence of source code is the state-of-art approach to build a code recommendation system. The syntax tree of source code has hierarchical structures. Ignoring the character istics of tree structures decreases the model performance. Current LSTM model handles sequential data. The performance of LSTM model will decrease sharply if the noise unseen data is distributed everywhere in the test suite. As code has free naming conventions, it is common for a model trained on one project to encounter many unknown words on another project. If we set many unseen words as UNK just like the solution in natural language processing, the number of UNK will be much greater than the sum of the most frequently appeared words. In an extreme case, just predicting UNK at everywhere may achieve very high prediction accuracy. Thus, such solution cannot reflect the true performance of a model when encountering noise unseen data. In this paper, we only mark a small number of rare words as UNK and show the prediction performance of models under in-project and cross-project evaluation. We propose a novel Hierarchical Language Model (HLM) to improve the robustness of LSTM model to gain the capacity about dealing with the inconsistency of data distribution between training and testing. The newly proposed HLM takes the hierarchical structure of code tree into consideration to predict code. HLM uses BiLSTM to generate embedding for sub-trees according to hierarchies and collects the embedding of sub-trees in context to predict next code. The experiments on inner-project and cross-project data sets indicate that the newly proposed Hierarchical Language Model (HLM) performs better than the state-of-art LSTM model in dealing with the data inconsistency between training and testing and achieves averagely 11.2% improvement in prediction accuracy.

هندسة البرمجيات

Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs

100 - Yanlin Wang , Hui Li 2021

Code completion has become an essential component of integrated development environments. Contemporary code completion methods rely on the abstract syntax tree (AST) to generate syntactically correct code. However, they cannot fully capture the seque ntial and repetitive patterns of writing code and the structural information of the AST. To alleviate these problems, we propose a new code completion approach named CCAG, which models the flattened sequence of a partial AST as an AST graph. CCAG uses our proposed AST Graph Attention Block to capture different dependencies in the AST graph for representation learning in code completion. The sub-tasks of code completion are optimized via multi-task learning in CCAG, and the task balance is automatically achieved using uncertainty without the need to tune task weights. The experimental results show that CCAG has superior performance than state-of-the-art approaches and it is able to provide intelligent code completion.

هندسة البرمجيات الذكاء الاصطناعي الحساب واللغة

Relationships between Software Architecture and Source Code in Practice: An Exploratory Survey and Interview

95 - Fangchao Tian , Peng Liang , Muhammad Ali Babar 2021

Context: Software Architecture (SA) and Source Code (SC) are two intertwined artefacts that represent the interdependent design decisions made at different levels of abstractions - High-Level (HL) and Low-Level (LL). An understanding of the relations hips between SA and SC is expected to bridge the gap between SA and SC for supporting maintenance and evolution of software systems. Objective: We aimed at exploring practitioners understanding about the relationships between SA and SC. Method: We used a mixed-method that combines an online survey with 87 respondents and an interview with 8 participants to collect the views of practitioners from 37 countries about the relationships between SA and SC. Results: Our results reveal that: practitioners mainly discuss five features of relationships between SA and SC; a few practitioners have adopted dedicated approaches and tools in the literature for identifying and analyzing the relationships between SA and SC despite recognizing the importance of such information for improving a systems quality attributes, especially maintainability and reliability. It is felt that cost and effort are the major impediments that prevent practitioners from identifying, analyzing, and using the relationships between SA and SC. Conclusions: The results have empirically identified five features of relationships between SA and SC reported in the literature from the perspective of practitioners and a systematic framework to manage the five features of relationships should be developed with dedicated approaches and tools considering the cost and benefit of maintaining the relationships.

هندسة البرمجيات

Code Generation Based on Deep Learning: a Brief Review

190 - Qihao Zhu , Wenjie Zhang 2021

Automatic software development has been a research hot spot in the field of software engineering (SE) in the past decade. In particular, deep learning (DL) has been applied and achieved a lot of progress in various SE tasks. Among all applications, a utomatic code generation by machines as a general concept, including code completion and code synthesis, is a common expectation in the field of SE, which may greatly reduce the development burden of the software developers and improves the efficiency and quality of the software development process to a certain extent. Code completion is an important part of modern integrated development environments (IDEs). Code completion technology effectively helps programmers complete code class names, method names, and key-words, etc., which improves the efficiency of program development and reduces spelling errors in the coding process. Such tools use static analysis on the code and provide candidates for completion arranged in alphabetical order. Code synthesis is implemented from two aspects, one based on input-output samples and the other based on functionality description. In this study, we introduce existing techniques of these two aspects and the corresponding DL techniques, and present some possible future research directions.

هندسة البرمجيات الذكاء الاصطناعي

Analysis of Source Code Using UPPAAL

269 - Mitja Kulczynski 2021

In recent years there has been a considerable effort in optimising formal methods for application to code. This has been driven by tools such as CPAChecker, DIVINE, and CBMC. At the same time tools such as Uppaal have been massively expanding the rea lm of more traditional model checking technologies to include strategy synthesis algorithms - an aspect becoming more and more needed as software becomes increasingly parallel. Instead of reimplementing the advances made by Uppaal in this area, we suggest in this paper to develop a bridge between the source code and the engine of Uppaal. Our approach uses the widespread intermediate language LLVM and makes recent advances of the Uppaal ecosystem readily available to analysis of source code.

هندسة البرمجيات