Sequential Copying Networks

135 0 0.0 ( 0 )

Download Cite

Added by Qingyu Zhou

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Qingyu Zhou - Nan Yang - Furu Wei

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Copying mechanism shows effectiveness in sequence-to-sequence based neural network models for text generation tasks, such as abstractive sentence summarization and question generation. However, existing works on modeling copying or pointing mechanism only considers single word copying from the source sentences. In this paper, we propose a novel copying framework, named Sequential Copying Networks (SeqCopyNet), which not only learns to copy single words, but also copies sequences from the input sentence. It leverages the pointer networks to explicitly select a sub-span from the source side to target side, and integrates this sequential copying mechanism to the generation process in the encoder-decoder paradigm. Experiments on abstractive sentence summarization and question generation tasks show that the proposed SeqCopyNet can copy meaningful spans and outperforms the baseline models.

rate research

Learning Noun Cases Using Sequential Neural Networks

144 - Sina Ahmadi 2018

Morphological declension, which aims to inflect nouns to indicate number, case and gender, is an important task in natural language processing (NLP). This research proposal seeks to address the degree to which Recurrent Neural Networks (RNNs) are efficient in learning to decline noun cases. Given the challenge of data sparsity in processing morphologically rich languages and also, the flexibility of sentence structures in such languages, we believe that modeling morphological dependencies can improve the performance of neural network models. It is suggested to carry out various experiments to understand the interpretable features that may lead to a better generalization of the learned models on cross-lingual tasks.

Computation and Language

Joint Copying and Restricted Generation for Paraphrase

319 - Ziqiang Cao , Chuwei Luo , Wenjie Li 2016

Many natural language generation tasks, such as abstractive summarization and text simplification, are paraphrase-orientated. In these tasks, copying and rewriting are two main writing modes. Most previous sequence-to-sequence (Seq2Seq) models use a single decoder and neglect this fact. In this paper, we develop a novel Seq2Seq model to fuse a copying decoder and a restricted generative decoder. The copying decoder finds the position to be copied based on a typical attention model. The generative decoder produces words limited in the source-specific vocabulary. To combine the two decoders and determine the final output, we develop a predictor to predict the mode of copying or rewriting. This predictor can be guided by the actual writing mode in the training data. We conduct extensive experiments on two different paraphrase datasets. The result shows that our model outperforms the state-of-the-art approaches in terms of both informativeness and language quality.

Computation and Language Information Retrieval

CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models

98 - Abhinav Singh , Patrick Xia , Guanghui Qin 2020

Copy mechanisms are employed in sequence to sequence models (seq2seq) to generate reproductions of words from the input to the output. These frameworks, operating at the lexical type level, fail to provide an explicit alignment that records where each token was copied from. Further, they require contiguous token sequences from the input (spans) to be copied individually. We present a model with an explicit token-level copy operation and extend it to copying entire spans. Our model provides hard alignments between spans in the input and output, allowing for nontraditional applications of seq2seq, like information extraction. We demonstrate the approach on Nested Named Entity Recognition, achieving near state-of-the-art accuracy with an order of magnitude increase in decoding speed.

Computation and Language Machine Learning

On the Copying Behaviors of Pre-Training for Neural Machine Translation

95 - Xuebo Liu , Longyue Wang , Derek F. Wong 2021

Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at https://github.com/SunbowLiu/CopyingPenalty.

Computation and Language Machine Learning

Attention Boosted Sequential Inference Model

173 - Guanyu Li , Pengfei Zhang , Caiyan Jia 2018

Attention mechanism has been proven effective on natural language processing. This paper proposes an attention boosted natural language inference model named aESIM by adding word attention and adaptive direction-oriented attention mechanisms to the traditional Bi-LSTM layer of natural language inference models, e.g. ESIM. This makes the inference model aESIM has the ability to effectively learn the representation of words and model the local subsentential inference between pairs of premise and hypothesis. The empirical studies on the SNLI, MultiNLI and Quora benchmarks manifest that aESIM is superior to the original ESIM model.

Computation and Language