Learning Neural Sequence-to-Sequence Models from Weak Feedback with Bipolar Ramp Loss

399 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Carolin Lawrence

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Laura Jehl - Carolin Lawrence - Stefan Riezler

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In many machine learning scenarios, supervision by gold labels is not available and consequently neural models cannot be trained directly by maximum likelihood estimation (MLE). In a weak supervision scenario, metric-augmented objectives can be employed to assign feedback to model outputs, which can be used to extract a supervision signal for training. We present several objectives for two separate weakly supervised tasks, machine translation and semantic parsing. We show that objectives should actively discourage negative outputs in addition to promoting a surrogate gold structure. This notion of bipolarity is naturally present in ramp loss objectives, which we adapt to neural models. We show that bipolar ramp loss objectives outperform other non-bipolar ramp loss objectives and minimum risk training (MRT) on both weakly supervised tasks, as well as on a supervised machine translation task. Additionally, we introduce a novel token-level ramp loss objective, which is able to outperform even the best sequence-level ramp loss on both weakly supervised tasks.

قيم البحث

214 - Julia Kreutzer , Stefan Riezler , Carolin Lawrence 2020

Large volumes of interaction logs can be collected from NLP systems that are deployed in the real world. How can this wealth of information be leveraged? Using such interaction logs in an offline reinforcement learning (RL) setting is a promising app roach. However, due to the nature of NLP tasks and the constraints of production systems, a series of challenges arise. We present a concise overview of these challenges and discuss possible solutions.

الحساب واللغة التعلم الآلي

Sparse Sequence-to-Sequence Models

100 - Ben Peters , Vlad Niculae , Andre F.T. Martins 2019

Sequence-to-sequence models are a powerful workhorse of NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is was teful, making models less interpretable and assigning probability mass to many implausible outputs. In this paper, we propose sparse sequence-to-sequence models, rooted in a new family of $alpha$-entmax transformations, which includes softmax and sparsemax as particular cases, and is sparse for any $alpha > 1$. We provide fast algorithms to evaluate these transformations and their gradients, which scale well for large vocabulary sizes. Our models are able to produce sparse alignments and to assign nonzero probability to a short list of plausible outputs, sometimes rendering beam search exact. Experiments on morphological inflection and machine translation reveal consistent gains over dense models.

الحساب واللغة التعلم الآلي

MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models

80 - Boyuan Pan , Yazheng Yang , Hao Li 2019

Machine Comprehension (MC) is one of the core problems in natural language processing, requiring both understanding of the natural language and knowledge about the world. Rapid progress has been made since the release of several benchmark datasets, a nd recently the state-of-the-art models even surpass human performance on the well-known SQuAD evaluation. In this paper, we transfer knowledge learned from machine comprehension to the sequence-to-sequence tasks to deepen the understanding of the text. We propose MacNet: a novel encoder-decoder supplementary architecture to the widely used attention-based sequence-to-sequence models. Experiments on neural machine translation (NMT) and abstractive text summarization show that our proposed framework can significantly improve the performance of the baseline models, and our method for the abstractive text summarization achieves the state-of-the-art results on the Gigaword dataset.

الحساب واللغة التعلم الآلي

Few-shot learning with attention-based sequence-to-sequence models

169 - Bertrand Higy , Peter Bell 2018

End-to-end approaches have recently become popular as a means of simplifying the training and deployment of speech recognition systems. However, they often require large amounts of data to perform well on large vocabulary tasks. With the aim of makin g end-to-end approaches usable by a broader range of researchers, we explore the potential to use end-to-end methods in small vocabulary contexts where smaller datasets may be used. A significant drawback of small-vocabulary systems is the difficulty of expanding the vocabulary beyond the original training samples -- therefore we also study strategies to extend the vocabulary with only few examples per new class (few-shot learning). Our results show that an attention-based encoder-decoder can be competitive against a strong baseline on a small vocabulary keyword classification task, reaching 97.5% of accuracy on Tensorflows Speech Commands dataset. It also shows promising results on the few-shot learning problem where a simple strategy achieved 68.8% of accuracy on new keywords with only 10 examples for each new class. This score goes up to 88.4% with a larger set of 100 examples.

الحساب واللغة

Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models

129 - Xisen Jin , Zhongyu Wei , Junyi Du 2019

The impressive performance of neural networks on natural language processing tasks attributes to their ability to model complicated word and phrase compositions. To explain how the model handles semantic compositions, we study hierarchical explanatio n of neural network predictions. We identify non-additivity and context independent importance attributions within hierarchies as two desirable properties for highlighting word and phrase compositions. We show some prior efforts on hierarchical explanations, e.g. contextual decomposition, do not satisfy the desired properties mathematically, leading to inconsistent explanation quality in different models. In this paper, we start by proposing a formal and general way to quantify the importance of each word and phrase. Following the formulation, we propose Sampling and Contextual Decomposition (SCD) algorithm and Sampling and Occlusion (SOC) algorithm. Human and metrics evaluation on both LSTM models and BERT Transformer models on multiple datasets show that our algorithms outperform prior hierarchical explanation algorithms. Our algorithms help to visualize semantic composition captured by models, extract classification rules and improve human trust of models. Project page: https://inklab.usc.edu/hiexpl/

الحساب واللغة التعلم الآلي التعلم الالي