Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Exploring Shared Structures and Hierarchies for Multiple NLP Tasks

71 0 0.0 ( 0 )

Download Cite

Added by Xipeng Qiu

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Junkun Chen - Kaiyu Chen - Xinchi Chen

Artificial Intelligence Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Designing shared neural architecture plays an important role in multi-task learning. The challenge is that finding an optimal sharing scheme heavily relies on the expert knowledge and is not scalable to a large number of diverse tasks. Inspired by the promising work of neural architecture search (NAS), we apply reinforcement learning to automatically find possible shared architecture for multi-task learning. Specifically, we use a controller to select from a set of shareable modules and assemble a task-specific architecture, and repeat the same procedure for other tasks. The controller is trained with reinforcement learning to maximize the expected accuracies for all tasks. We conduct extensive experiments on two types of tasks, text classification and sequence labeling, which demonstrate the benefits of our approach.

rate research

Exploring and Predicting Transferability across NLP Tasks

98 - Tu Vu , Tong Wang , Tsendsuren Munkhdalai 2020

Recent advances in NLP demonstrate the effectiveness of training large-scale language models and transferring them to downstream tasks. Can fine-tuning these models on tasks other than language modeling further improve performance? In this paper, we conduct an extensive study of the transferability between 33 NLP tasks across three broad classes of problems (text classification, question answering, and sequence labeling). Our results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task (e.g., part-of-speech tagging transfers well to the DROP QA dataset). We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task, and we validate their effectiveness in experiments controlled for source and target data size. Overall, our experiments reveal that factors such as source data size, task and domain similarity, and task complexity all play a role in determining transferability.

Computation and Language

Making Transformers Solve Compositional Tasks

101 - Santiago Onta~non , Joshua Ainslie , Vaclav Cvicek 2021

Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. Through this exploration, we identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in a diverse set of compositional tasks, and that achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).

Artificial Intelligence Computation and Language

Deriving Commonsense Inference Tasks from Interactive Fictions

90 - Mo Yu , Xiaoxiao Guo , Yufei Feng 2020

Commonsense reasoning simulates the human ability to make presumptions about our physical world, and it is an indispensable cornerstone in building general AI systems. We propose a new commonsense reasoning dataset based on humans interactive fiction game playings as human players demonstrate plentiful and diverse commonsense reasoning. The new dataset mitigates several limitations of the prior art. Experiments show that our task is solvable to human experts with sufficient commonsense knowledge but poses challenges to existing machine reading models, with a big performance gap of more than 30%.

Artificial Intelligence Computation and Language

Dice Loss for Data-imbalanced NLP Tasks

344 - Xiaoya Li , Xiaofei Sun , Yuxian Meng 2019

Many NLP tasks such as tagging and machine reading comprehension are faced with the severe data imbalance issue: negative examples significantly outnumber positive examples, and the huge number of background examples (or easy-negative examples) overwhelms the training. The most commonly used cross entropy (CE) criteria is actually an accuracy-oriented objective, and thus creates a discrepancy between training and test: at training time, each training instance contributes equally to the objective function, while at test time F1 score concerns more about positive examples. In this paper, we propose to use dice loss in replacement of the standard cross-entropy objective for data-imbalanced NLP tasks. Dice loss is based on the Sorensen-Dice coefficient or Tversky index, which attaches similar importance to false positives and false negatives, and is more immune to the data-imbalance issue. To further alleviate the dominating influence from easy-negative examples in training, we propose to associate training examples with dynamically adjusted weights to deemphasize easy-negative examples.Theoretical analysis shows that this strategy narrows down the gap between the F1 score in evaluation and the dice loss in training. With the proposed training objective, we observe significant performance boost on a wide range of data imbalanced NLP tasks. Notably, we are able to achieve SOTA results on CTB5, CTB6 and UD1.4 for the part of speech tagging task; SOTA results on CoNLL03, OntoNotes5.0, MSRA and OntoNotes4.0 for the named entity recognition task; along with competitive results on the tasks of machine reading comprehension and paraphrase identification.

Computation and Language

Pfaffian structures and certain solutions to BKP hierarchies II. Multiple integrals

53 - A. Orlov , T. Shiota , K. Takasaki 2016

We introduce a useful and rather simple classes of BKP tau functions which which we shall shall call easy tau functions. We consider the large BKP hiearchy related to $O(2infty +1)$ which was introduced in cite{KvdLbispec} (which is closely related to the DKP $O(2infty) $hierarchy introduced in cite{JM}). Actually easy tau functions of the small BKP was already considered in cite{HLO}, here we are more interested in the large BKP and also the mixed small-large BKP tau functions cite{KvdLbispec}. Tau functions under consideration are equal to sums over partitions and to multi-integrals. In this way they may be appliciable in models of random partitions and models of random matrices. Here in the part II we consider multi-intergals and series of $N$-ply integrals in $N$. Relations to matrix models is explained. This paper may be viewed as a developement of the the paper by J.van de Leur cite{L1} related to orthogonal and symplectic ensembles of random matrices.

Exactly Solvable and Integrable Systems

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Exploring Shared Structures and Hierarchies for Multiple NLP Tasks

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions