New community

Subscribe to the gold package and get unlimited access to Shamra Academy

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees

يلقي: تعزيز رمز التعزيز مع تقسيم التسلسل الهرمي وإعادة بناء أشجار بناء الجملة مجردة

207 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

hierarchical splitting splitting and reconstruction abstract syntax trees تقسيم هرمي تقسيم وإعادة الإعمار أشجار بناء الجملة مجردة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract syntax trees (ASTs) is conducive to summary generation. However, existing approaches fail to fully capture the rich information in ASTs because of the large size/depth of ASTs. In this paper, we propose a novel model CAST that hierarchically splits and reconstructs ASTs. First, we hierarchically split a large AST into a set of subtrees and utilize a recursive neural network to encode the subtrees. Then, we aggregate the embeddings of subtrees by reconstructing the split ASTs to get the representation of the complete AST. Finally, AST representation, together with source code embedding obtained by a vanilla code token encoder, is used for code summarization. Extensive experiments, including the ablation study and the human evaluation, on benchmarks have demonstrated the power of CAST. To facilitate reproducibility, our code and data are available at https://github.com/DeepSoftwareAnalytics/CAST.

References used

https://aclanthology.org/

rate research

Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy

408 - Association for Computation Linguistics 2021 مقالة

Statistical language modeling and translation with transformers have found many successful applications in program understanding and generation tasks, setting high benchmarks for tools in modern software development environments. The finite context w indow of these neural models means, however, that they will be unable to leverage the entire relevant context of large files and packages for any given task. While there are many efforts to extend the context window, we introduce an architecture-independent approach for leveraging the syntactic hierarchies of source code for incorporating entire file-level context into a fixed-length window. Using concrete syntax trees of each source file we extract syntactic hierarchies and integrate them into context window by selectively removing from view more specific, less relevant scopes for a given task. We evaluate this approach on code generation tasks and joint translation of natural language and source code in Python programming language, achieving a new state-of-the-art in code completion and summarization for Python in the CodeXGLUE benchmark. We also introduce new CodeXGLUE benchmarks for user-experience-motivated tasks: code completion with normalized literals, method body completion/code summarization conditioned on file-level context.

extended window access syntax hierarchy extended window إمكانية الوصول إلى النافذة الممتدة بناء جملة الهرمية نافذة ممتدة صناعة حمض الفوسفور المزيد..

Improving Abstractive Dialogue Summarization with Hierarchical Pretraining and Topic Segment

289 - Association for Computation Linguistics 2021 مقالة

With the increasing abundance of meeting transcripts, meeting summary has attracted more and more attention from researchers. The unsupervised pre-training method based on transformer structure combined with fine-tuning of downstream tasks has achiev ed great success in the field of text summarization. However, the semantic structure and style of meeting transcripts are quite different from that of articles. In this work, we propose a hierarchical transformer encoder-decoder network with multi-task pre-training. Specifically, we mask key sentences at the word-level encoder and generate them at the decoder. Besides, we randomly mask some of the role alignments in the input text and force the model to recover the original role tags to complete the alignments. In addition, we introduce a topic segmentation mechanism to further improve the quality of the generated summaries. The experimental results show that our model is superior to the previous methods in meeting summary datasets AMI and ICSI.

improving abstractive dialogue تحسين الحوار الجماعي صناعة حمض الفوسفور

Abstractive Document Summarization with Word Embedding Reconstruction

311 - Association for Computation Linguistics 2021 مقالة

Neural sequence-to-sequence (Seq2Seq) models and BERT have achieved substantial improvements in abstractive document summarization (ADS) without and with pre-training, respectively. However, they sometimes repeatedly attend to unimportant source phra ses while mistakenly ignore important ones. We present reconstruction mechanisms on two levels to alleviate this issue. The sequence-level reconstructor reconstructs the whole document from the hidden layer of the target summary, while the word embedding-level one rebuilds the average of word embeddings of the source at the target side to guarantee that as much critical information is included in the summary as possible. Based on the assumption that inverse document frequency (IDF) measures how important a word is, we further leverage the IDF weights in our embedding-level reconstructor. The proposed frameworks lead to promising improvements for ROUGE metrics and human rating on both the CNN/Daily Mail and Newsroom summarization datasets.

abstractive document summarization document summarization word embedding reconstruction ملخص وثيقة الجماع تلخيص الوثائق كلمة تضمين إعادة الإعمار صناعة حمض الفوسفور المزيد..

Syntax Matters! Syntax-Controlled in Text Style Transfer

434 - Association for Computation Linguistics 2021 مقالة

Existing text style transfer (TST) methods rely on style classifiers to disentangle the text's content and style attributes for text style transfer. While the style classifier plays a critical role in existing TST methods, there is no known investiga tion on its effect on the TST methods. In this paper, we conduct an empirical study on the limitations of the style classifiers used in existing TST methods. We demonstrated that the existing style classifiers cannot learn sentence syntax effectively and ultimately worsen existing TST models' performance. To address this issue, we propose a novel Syntax-Aware Controllable Generation (SACG) model, which includes a syntax-aware style classifier that ensures learned style latent representations effectively capture the sentence structure for TST. Through extensive experiments on two popular text style transfer tasks, we show that our proposed method significantly outperforms twelve state-of-the-art methods. Our case studies have also demonstrated SACG's ability to generate fluent target-style sentences that preserved the original content.

طرق التعلم العميق أسلوب صناعة حمض الفوسفور

Incorporating Syntax and Semantics in Coreference Resolution with Heterogeneous Graph Attention Network

589 - Association for Computation Linguistics 2021 مقالة

External syntactic and semantic information has been largely ignored by existing neural coreference resolution models. In this paper, we present a heterogeneous graph-based model to incorporate syntactic and semantic structures of sentences. The prop osed graph contains a syntactic sub-graph where tokens are connected based on a dependency tree, and a semantic sub-graph that contains arguments and predicates as nodes and semantic role labels as edges. By applying a graph attention network, we can obtain syntactically and semantically augmented word representation, which can be integrated using an attentive integration layer and gating mechanism. Experiments on the OntoNotes 5.0 benchmark show the effectiveness of our proposed model.

incorporating syntax graph attention network coreference resolution models دمج بناء الجملة شبكة انتباه الرسم البياني نماذج حل النماذج صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees

يلقي: تعزيز رمز التعزيز مع تقسيم التسلسل الهرمي وإعادة بناء أشجار بناء الجملة مجردة

Ask ChatGPT about the research

Read More

suggested questions