A Hierarchical Decoder with Three-level Hierarchical Attention to Generate Abstractive Summaries of Interleaved Texts

140 0 0.0 ( 0 )

Download Cite

Added by Sanjeev Kumar Karn

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Sanjeev Kumar Karn - Francine Chen - Yan-Ying Chen

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Interleaved texts, where posts belonging to different threads occur in one sequence, are a common occurrence, e.g., online chat conversations. To quickly obtain an overview of such texts, existing systems first disentangle the posts by threads and then extract summaries from those threads. The major issues with such systems are error propagation and non-fluent summary. To address those, we propose an end-to-end trainable hierarchical encoder-decoder system. We also introduce a novel hierarchical attention mechanism which combines three levels of information from an interleaved text, i.e, posts, phrases and words, and implicitly disentangles the threads. We evaluated the proposed system on multiple interleaved text datasets, and it out-performs a SOTA two-step system by 20-40%.

rate research

H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model

118 - Yanpei Shi , Qiang Huang , Thomas Hain 2019

In this paper, a hierarchical attention network to generate utterance-level embeddings (H-vectors) for speaker identification is proposed. Since different parts of an utterance may have different contributions to speaker identities, the use of hierarchical structure aims to learn speaker related information locally and globally. In the proposed approach, frame-level encoder and attention are applied on segments of an input utterance and generate individual segment vectors. Then, segment level attention is applied on the segment vectors to construct an utterance representation. To evaluate the effectiveness of the proposed approach, NIST SRE 2008 Part1 dataset is used for training, and two datasets, Switchboard Cellular part1 and CallHome American English Speech, are used to evaluate the quality of extracted utterance embeddings on speaker identification and verification tasks. In comparison with two baselines, X-vector, X-vector+Attention, the obtained results show that H-vectors can achieve a significantly better performance. Furthermore, the extracted utterance-level embeddings are more discriminative than the two baselines when mapped into a 2D space using t-SNE.

Computation and Language Machine Learning Sound

A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining

155 - Chenguang Zhu , Ruochen Xu , Michael Zeng 2020

With the abundance of automatic meeting transcripts, meeting summarization is of great interest to both participants and other parties. Traditional methods of summarizing meetings depend on complex multi-step pipelines that make joint optimization intractable. Meanwhile, there are a handful of deep neural models for text summarization and dialogue systems. However, the semantic structure and styles of meeting transcripts are quite different from articles and conversations. In this paper, we propose a novel abstractive summary network that adapts to the meeting scenario. We design a hierarchical structure to accommodate long meeting transcripts and a role vector to depict the difference among speakers. Furthermore, due to the inadequacy of meeting summary data, we pretrain the model on large-scale news summary data. Empirical results show that our model outperforms previous approaches in both automatic metrics and human evaluation. For example, on ICSI dataset, the ROUGE-1 score increases from 34.66% to 46.28%.

Computation and Language

A New Approach to Overgenerating and Scoring Abstractive Summaries

73 - Kaiqiang Song , Bingqing Wang , Zhe Feng 2021

We propose a new approach to generate multiple variants of the target summary with diverse content and varying lengths, then score and select admissible ones according to users needs. Abstractive summarizers trained on single reference summaries may struggle to produce outputs that achieve multiple desirable properties, i.e., capturing the most important information, being faithful to the original, grammatical and fluent. In this paper, we propose a two-staged strategy to generate a diverse set of candidate summaries from the source text in stage one, then score and select admissible ones in stage two. Importantly, our generator gives a precise control over the length of the summary, which is especially well-suited when space is limited. Our selectors are designed to predict the optimal summary length and put special emphasis on faithfulness to the original text. Both stages can be effectively trained, optimized and evaluated. Our experiments on benchmark summarization datasets suggest that this paradigm can achieve state-of-the-art performance.

Computation and Language

RNNs can generate bounded hierarchical languages with optimal memory

219 - John Hewitt , Michael Hahn , Surya Ganguli 2020

Recurrent neural networks empirically generate natural language with high syntactic fidelity. However, their success is not well-understood theoretically. We provide theoretical insight into this success, proving in a finite-precision setting that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax. We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth, reflecting the bounded memory needs and long-distance dependencies of natural language syntax. The best known results use $O(k^{frac{m}{2}})$ memory (hidden units) to generate these languages. We prove that an RNN with $O(m log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction. Finally, we show that no algorithm, even with unbounded computation, can suffice with $o(m log k)$ hidden units.

Computation and Language

The Summary Loop: Learning to Write Abstractive Summaries Without Examples

142 - Philippe Laban , Andrew Hsi , John Canny 2021

This work presents a new approach to unsupervised abstractive summarization based on maximizing a combination of coverage and fluency for a given length constraint. It introduces a novel method that encourages the inclusion of key terms from the original document into the summary: key terms are masked out of the original document and must be filled in by a coverage model using the current generated summary. A novel unsupervised training procedure leverages this coverage model along with a fluency model to generate and score summaries. When tested on popular news summarization datasets, the method outperforms previous unsupervised methods by more than 2 R-1 points, and approaches results of competitive supervised methods. Our model attains higher levels of abstraction with copied passages roughly two times shorter than prior work, and learns to compress and merge sentences without supervision.

Computation and Language Machine Learning