ﻻ يوجد ملخص باللغة العربية
We introduce Mem2Mem, a memory-to-memory mechanism for hierarchical recurrent neural network based encoder decoder architectures and we explore its use for abstractive document summarization. Mem2Mem transfers memories via readable/writable external memory modules that augment both the encoder and decoder. Our memory regularization compresses an encoded input article into a more compact set of sentence representations. Most importantly, the memory compression step performs implicit extraction without labels, sidestepping issues with suboptimal ground-truth data and exposure bias of hybrid extractive-abstractive summarization techniques. By allowing the decoder to read/write over the encoded input memory, the model learns to read salient information about the input article while keeping track of what has been generated. Our Mem2Mem approach yields results that are competitive with state of the art transformer based summarization methods, but with 16 times fewer parameters
As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated
Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks. Inspired by work on curriculum learning, we pro
Despite agreement on the importance of detecting out-of-distribution (OOD) examples, there is little consensus on the formal definition of OOD examples and how to best detect them. We categorize these examples by whether they exhibit a background shi
In this paper, we propose a method for automatically constructing a passage-to-summary dataset by mining the Wikipedia page revision histories. In particular, the method mines the main body passages and the introduction sentences which are added to t
While task-specific finetuning of pretrained networks has led to significant empirical advances in NLP, the large size of networks makes finetuning difficult to deploy in multi-task, memory-constrained settings. We propose diff pruning as a simple ap