ﻻ يوجد ملخص باللغة العربية
In many applications of machine learning, certain categories of examples may be underrepresented in the training data, causing systems to underperform on such few-shot cases at test time. A common remedy is to perform data augmentation, such as by duplicating underrepresented examples, or heuristically synthesizing new examples. But these remedies often fail to cover the full diversity and complexity of real examples. We propose a data augmentation approach that performs neural Example Extrapolation (Ex2). Given a handful of exemplars sampled from some distribution, Ex2 synthesizes new examples that also belong to the same distribution. The Ex2 model is learned by simulating the example generation procedure on data-rich slices of the data, and it is applied to underrepresented, few-shot slices. We apply Ex2 to a range of language understanding tasks and significantly improve over state-of-the-art methods on multiple few-shot learning benchmarks, including for relation extraction (FewRel) and intent classification + slot filling (SNIPS).
This paper asks whether extrapolating the hidden space distribution of text examples from one class onto another is a valid inductive bias for data augmentation. To operationalize this question, I propose a simple data augmentation protocol called go
Sentiment analysis has been widely used by businesses for social media opinion mining, especially in the financial services industry, where customers feedbacks are critical for companies. Recent progress of neural network models has achieved remarkab
In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these types of tr
Data augmentation is an effective way to improve the performance of many neural text generation models. However, current data augmentation methods need to define or choose proper data mapping functions that map the original samples into the augmented
Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data. In this paper, we address this difficulty by proposing a reinforcement learning (RL) based framework for data augmentation that c