ترغب بنشر مسار تعليمي؟ اضغط هنا

Exploiting Method Names to Improve Code Summarization: A Deliberation Multi-Task Learning Approach

116   0   0.0 ( 0 )
 نشر من قبل Rui Xie
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Code summaries are brief natural language descriptions of source code pieces. The main purpose of code summarization is to assist developers in understanding code and to reduce documentation workload. In this paper, we design a novel multi-task learning (MTL) approach for code summarization through mining the relationship between method code summaries and method names. More specifically, since a methods name can be considered as a shorter version of its code summary, we first introduce the tasks of generation and informativeness prediction of method names as two auxiliary training objectives for code summarization. A novel two-pass deliberation mechanism is then incorporated into our MTL architecture to generate more consistent intermediate states fed into a summary decoder, especially when informative method names do not exist. To evaluate our deliberation MTL approach, we carried out a large-scale experiment on two existing datasets for Java and Python. The experiment results show that our technique can be easily applied to many state-of-the-art neural models for code summarization and improve their performance. Meanwhile, our approach shows significant superiority when generating summaries for methods with non-informative names.

قيم البحث

اقرأ أيضاً

We introduce a new approach for abstractive text summarization, Topic-Guided Abstractive Summarization, which calibrates long-range dependencies from topic-level features with globally salient content. The idea is to incorporate neural topic modeling with a Transformer-based sequence-to-sequence (seq2seq) model in a joint learning framework. This design can learn and preserve the global semantics of the document, which can provide additional contextual guidance for capturing important ideas of the document, thereby enhancing the generation of summary. We conduct extensive experiments on two datasets and the results show that our proposed model outperforms many extractive and abstractive systems in terms of both ROUGE measurements and human evaluation. Our code is available at: https://github.com/chz816/tas.
We develop a mathematical framework for solving multi-task reinforcement learning (MTRL) problems based on a type of policy gradient method. The goal in MTRL is to learn a common policy that operates effectively in different environments; these envir onments have similar (or overlapping) state spaces, but have different rewards and dynamics. We highlight two fundamental challenges in MTRL that are not present in its single task counterpart, and illustrate them with simple examples. We then develop a decentralized entropy-regularized policy gradient method for solving the MTRL problem, and study its finite-time convergence rate. We demonstrate the effectiveness of the proposed method using a series of numerical experiments. These experiments range from small-scale GridWorld problems that readily demonstrate the trade-offs involved in multi-task learning to large-scale problems, where common policies are learned to navigate an airborne drone in multiple (simulated) environments.
Source code summarization aims at generating concise descriptions of given programs functionalities. While Transformer-based approaches achieve promising performance, they do not explicitly incorporate the code structure information which is importan t for capturing code semantics. Besides, without explicit constraints, multi-head attentions in Transformer may suffer from attention collapse, leading to poor code representations for summarization. Effectively integrating the code structure information into Transformer is under-explored in this task domain. In this paper, we propose a novel approach named SG-Trans to incorporate code structural properties into Transformer. Specifically, to capture the hierarchical characteristics of code, we inject the local symbolic information (e.g., code tokens) and global syntactic structure (e.g., data flow) into the self-attention module as inductive bias. Extensive evaluation shows the superior performance of SG-Trans over the state-of-the-art approaches.
67 - Chao Feng , Shi-jie We 2021
This paper presents the participation of the MiniTrue team in the FinSim-3 shared task on learning semantic similarities for the financial domain in English language. Our approach combines contextual embeddings learned by transformer-based language m odels with network structures embeddings extracted on external knowledge sources, to create more meaningful representations of financial domain entities and terms. For this, two BERT based language models and a knowledge graph embedding model are used. Besides, we propose a voting function to joint three basic models for the final inference. Experimental results show that the model with the knowledge graph embeddings has achieved a superior result than these models with only contextual embeddings. Nevertheless, we also observe that our voting function brings an extra benefit to the final system.
It is a challenging and practical research problem to obtain effective compression of lengthy product titles for E-commerce. This is particularly important as more and more users browse mobile E-commerce apps and more merchants make the original prod uct titles redundant and lengthy for Search Engine Optimization. Traditional text summarization approaches often require a large amount of preprocessing costs and do not capture the important issue of conversion rate in E-commerce. This paper proposes a novel multi-task learning approach for improving product title compression with user search log data. In particular, a pointer network-based sequence-to-sequence approach is utilized for title compression with an attentive mechanism as an extractive method and an attentive encoder-decoder approach is utilized for generating user search queries. The encoding parameters (i.e., semantic embedding of original titles) are shared among the two tasks and the attention distributions are jointly optimized. An extensive set of experiments with both human annotated data and online deployment demonstrate the advantage of the proposed research for both compression qualities and online business values.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا