Do you want to publish a course? Click here

Meta-Learning for Domain Generalization in Semantic Parsing

التعلم التعريف لتعميم المجال في التحليل الدلالي

176   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

The importance of building semantic parsers which can be applied to new domains and generate programs unseen at training has long been acknowledged, and datasets testing out-of-domain performance are becoming increasingly available. However, little or no attention has been devoted to learning algorithms or objectives which promote domain generalization, with virtually all existing approaches relying on standard supervised learning. In this work, we use a meta-learning framework which targets zero-shot domain generalization for semantic parsing. We apply a model-agnostic training algorithm that simulates zero-shot parsing by constructing virtual train and test sets from disjoint domains. The learning objective capitalizes on the intuition that gradient steps that improve source-domain performance should also improve target-domain performance, thus encouraging a parser to generalize to unseen target domains. Experimental results on the (English) Spider and Chinese Spider datasets show that the meta-learning objective significantly boosts the performance of a baseline parser.

References used
https://aclanthology.org/
rate research

Read More

Semantic parsing aims at translating natural language (NL) utterances onto machine-interpretable programs, which can be executed against a real-world environment. The expensive annotation of utterance-program pairs has long been acknowledged as a maj or bottleneck for the deployment of contemporary neural models to real-life applications. In this work, we focus on the task of semi-supervised learning where a limited amount of annotated data is available together with many unlabeled NL utterances. Based on the observation that programs which correspond to NL utterances should always be executable, we propose to encourage a parser to generate executable programs for unlabeled utterances. Due to the large search space of executable programs, conventional methods that use beam-search for approximation, such as self-training and top-k marginal likelihood training, do not perform as well. Instead, we propose a set of new training objectives that are derived by approaching the problem of learning from executions from the posterior regularization perspective. Our new objectives outperform conventional methods on Overnight and GeoQuery, bridging the gap between semi-supervised and supervised learning.
Humans are capable of learning novel concepts from very few examples; in contrast, state-of-the-art machine learning algorithms typically need thousands of examples to do so. In this paper, we propose an algorithm for learning novel concepts by repre senting them as programs over existing concepts. This way the concept learning problem is naturally a program synthesis problem and our algorithm learns from a few examples to synthesize a program representing the novel concept. In addition, we perform a theoretical analysis of our approach for the case where the program defining the novel concept over existing ones is context-free. We show that given a learned grammar-based parser and a novel production rule, we can augment the parser with the production rule in a way that provably generalizes. We evaluate our approach by learning concepts in the semantic parsing domain extended to the few-shot novel concept learning setting, showing that our approach significantly outperforms end-to-end neural semantic parsers.
Meta learning aims to optimize the model's capability to generalize to new tasks and domains. Lacking a data-efficient way to create meta training tasks has prevented the application of meta-learning to the real-world few shot learning scenarios. Rec ent studies have proposed unsupervised approaches to create meta-training tasks from unlabeled data for free, e.g., the SMLMT method (Bansal et al., 2020a) constructs unsupervised multi-class classification tasks from the unlabeled text by randomly masking words in the sentence and let the meta learner choose which word to fill in the blank. This study proposes a semi-supervised meta-learning approach that incorporates both the representation power of large pre-trained language models and the generalization capability of prototypical networks enhanced by SMLMT. The semi-supervised meta training approach avoids overfitting prototypical networks on a small number of labeled training examples and quickly learns cross-domain task-specific representation only from a few supporting examples. By incorporating SMLMT with prototypical networks, the meta learner generalizes better to unseen domains and gains higher accuracy on out-of-scope examples without the heavy lifting of pre-training. We observe significant improvement in few-shot generalization after training only a few epochs on the intent classification tasks evaluated in a multi-domain setting.
Meta-learning has recently been proposed to learn models and algorithms that can generalize from a handful of examples. However, applications to structured prediction and textual tasks pose challenges for meta-learning algorithms. In this paper, we a pply two meta-learning algorithms, Prototypical Networks and Reptile, to few-shot Named Entity Recognition (NER), including a method for incorporating language model pre-training and Conditional Random Fields (CRF). We propose a task generation scheme for converting classical NER datasets into the few-shot setting, for both training and evaluation. Using three public datasets, we show these meta-learning algorithms outperform a reasonable fine-tuned BERT baseline. In addition, we propose a novel combination of Prototypical Networks and Reptile.
Recent researches show that pre-trained models (PTMs) are beneficial to Chinese Word Segmentation (CWS). However, PTMs used in previous works usually adopt language modeling as pre-training tasks, lacking task-specific prior segmentation knowledge an d ignoring the discrepancy between pre-training tasks and downstream CWS tasks. In this paper, we propose a CWS-specific pre-trained model MetaSeg, which employs a unified architecture and incorporates meta learning algorithm into a multi-criteria pre-training task. Empirical results show that MetaSeg could utilize common prior segmentation knowledge from different existing criteria and alleviate the discrepancy between pre-trained models and downstream CWS tasks. Besides, MetaSeg can achieve new state-of-the-art performance on twelve widely-used CWS datasets and significantly improve model performance in low-resource settings.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا