ﻻ يوجد ملخص باللغة العربية
Designing shared neural architecture plays an important role in multi-task learning. The challenge is that finding an optimal sharing scheme heavily relies on the expert knowledge and is not scalable to a large number of diverse tasks. Inspired by the promising work of neural architecture search (NAS), we apply reinforcement learning to automatically find possible shared architecture for multi-task learning. Specifically, we use a controller to select from a set of shareable modules and assemble a task-specific architecture, and repeat the same procedure for other tasks. The controller is trained with reinforcement learning to maximize the expected accuracies for all tasks. We conduct extensive experiments on two types of tasks, text classification and sequence labeling, which demonstrate the benefits of our approach.
Recent advances in NLP demonstrate the effectiveness of training large-scale language models and transferring them to downstream tasks. Can fine-tuning these models on tasks other than language modeling further improve performance? In this paper, we
Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the
Commonsense reasoning simulates the human ability to make presumptions about our physical world, and it is an indispensable cornerstone in building general AI systems. We propose a new commonsense reasoning dataset based on humans interactive fiction
Many NLP tasks such as tagging and machine reading comprehension are faced with the severe data imbalance issue: negative examples significantly outnumber positive examples, and the huge number of background examples (or easy-negative examples) overw
We introduce a useful and rather simple classes of BKP tau functions which which we shall shall call easy tau functions. We consider the large BKP hiearchy related to $O(2infty +1)$ which was introduced in cite{KvdLbispec} (which is closely related t