ترغب بنشر مسار تعليمي؟ اضغط هنا

Generating Pertinent and Diversified Comments with Topic-aware Pointer-Generator Networks

93   0   0.0 ( 0 )
 نشر من قبل Junheng Huang
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Comment generation, a new and challenging task in Natural Language Generation (NLG), attracts a lot of attention in recent years. However, comments generated by previous work tend to lack pertinence and diversity. In this paper, we propose a novel generation model based on Topic-aware Pointer-Generator Networks (TPGN), which can utilize the topic information hidden in the articles to guide the generation of pertinent and diversified comments. Firstly, we design a keyword-level and topic-level encoder attention mechanism to capture topic information in the articles. Next, we integrate the topic information into pointer-generator networks to guide comment generation. Experiments on a large scale of comment generation dataset show that our model produces the valuable comments and outperforms competitive baseline models significantly.



قيم البحث

اقرأ أيضاً

111 - Xin Wang , Xin Peng , Jun Sun 2021
Code summarization is the task of generating natural language description of source code, which is important for program understanding and maintenance. Existing approaches treat the task as a machine translation problem (e.g., from Java to English) a nd applied Neural Machine Translation models to solve the problem. These approaches only consider a given code unit (e.g., a method) without its broader context. The lacking of context may hinder the NMT model from gathering sufficient information for code summarization. Furthermore, existing approaches use a fixed vocabulary and do not fully consider the words in code, while many words in the code summary may come from the code. In this work, we present a neural network model named ToPNN for code summarization, which uses the topics in a broader context (e.g., class) to guide the neural networks that combine the generation of new words and the copy of existing words in code. Based on the model we present an approach for generating natural language code summaries at the method level (i.e., method comments). We evaluate our approach using a dataset with 4,203,565 commented Java methods. The results show significant improvement over state-of-the-art approaches and confirm the positive effect of class topics and the copy mechanism.
Growing polarization of the news media has been blamed for fanning disagreement, controversy and even violence. Early identification of polarized topics is thus an urgent matter that can help mitigate conflict. However, accurate measurement of topic- wise polarization is still an open research challenge. To address this gap, we propose Partisanship-aware Contextualized Topic Embeddings (PaCTE), a method to automatically detect polarized topics from partisan news sources. Specifically, utilizing a language model that has been finetuned on recognizing partisanship of the news articles, we represent the ideology of a news corpus on a topic by corpus-contextualized topic embedding and measure the polarization using cosine distance. We apply our method to a dataset of news articles about the COVID-19 pandemic. Extensive experiments on different news sources and topics demonstrate the efficacy of our method to capture topical polarization, as indicated by its effectiveness of retrieving the most polarized topics.
This work focuses on combining nonparametric topic models with Auto-Encoding Variational Bayes (AEVB). Specifically, we first propose iTM-VAE, where the topics are treated as trainable parameters and the document-specific topic proportions are obtain ed by a stick-breaking construction. The inference of iTM-VAE is modeled by neural networks such that it can be computed in a simple feed-forward manner. We also describe how to introduce a hyper-prior into iTM-VAE so as to model the uncertainty of the prior parameter. Actually, the hyper-prior technique is quite general and we show that it can be applied to other AEVB based models to alleviate the {it collapse-to-prior} problem elegantly. Moreover, we also propose HiTM-VAE, where the document-specific topic distributions are generated in a hierarchical manner. HiTM-VAE is even more flexible and can generate topic distributions with better variability. Experimental results on 20News and Reuters RCV1-V2 datasets show that the proposed models outperform the state-of-the-art baselines significantly. The advantages of the hyper-prior technique and the hierarchical model construction are also confirmed by experiments.
177 - Cao Liu , Kang Liu , Shizhu He 2019
We tackle the task of question generation over knowledge bases. Conventional methods for this task neglect two crucial research issues: 1) the given predicate needs to be expressed; 2) the answer to the generated question needs to be definitive. In t his paper, we strive toward the above two issues via incorporating diversified contexts and answer-aware loss. Specifically, we propose a neural encoder-decoder model with multi-level copy mechanisms to generate such questions. Furthermore, the answer aware loss is introduced to make generated questions corresponding to more definitive answers. Experiments demonstrate that our model achieves state-of-the-art performance. Meanwhile, such generated question can express the given predicate and correspond to a definitive answer.
Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as sentences, contains much information which is generally lost with these models. In this paper, we propose sentenceLDA, an extension of LDA whose goal is to overcome this limitation by incorporating the structure of the text in the generative and inference processes. We illustrate the advantages of sentenceLDA by comparing it with LDA using both intrinsic (perplexity) and extrinsic (text classification) evaluation tasks on different text collections.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا