ترغب بنشر مسار تعليمي؟ اضغط هنا

An empirical study of pretrained representations for few-shot classification

145   0   0.0 ( 0 )
 نشر من قبل Tiago Ramalho
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

Recent algorithms with state-of-the-art few-shot classification results start their procedure by computing data features output by a large pretrained model. In this paper we systematically investigate which models provide the best representations for a few-shot image classification task when pretrained on the Imagenet dataset. We test their representations when used as the starting point for different few-shot classification algorithms. We observe that models trained on a supervised classification task have higher performance than models trained in an unsupervised manner even when transferred to out-of-distribution datasets. Models trained with adversarial robustness transfer better, while having slightly lower accuracy than supervised models.



قيم البحث

اقرأ أيضاً

Prompt-based knowledge probing for 1-hop relations has been used to measure how much world knowledge is stored in pretrained language models. Existing work uses considerable amounts of data to tune the prompts for better performance. In this work, we compare a variety of approaches under a few-shot knowledge probing setting, where only a small number (e.g., 10 or 20) of example triples are available. In addition, we create a new dataset named TREx-2p, which contains 2-hop relations. We report that few-shot examples can strongly boost the probing performance for both 1-hop and 2-hop relations. In particular, we find that a simple-yet-effective approach of finetuning the bias vectors in the model outperforms existing prompt-engineering methods. Our dataset and code are available at url{https://github.com/cloudygoose/fewshot_lama}.
A meta-model is trained on a distribution of similar tasks such that it learns an algorithm that can quickly adapt to a novel task with only a handful of labeled examples. Most of current meta-learning methods assume that the meta-training set consis ts of relevant tasks sampled from a single distribution. In practice, however, a new task is often out of the task distribution, yielding a performance degradation. One way to tackle this problem is to construct an ensemble of meta-learners such that each meta-learner is trained on different task distribution. In this paper we present a method for constructing a mixture of meta-learners (MxML), where mixing parameters are determined by the weight prediction network (WPN) optimized to improve the few-shot classification performance. Experiments on various datasets demonstrate that MxML significantly outperforms state-of-the-art meta-learners, or their naive ensemble in the case of out-of-distribution as well as in-distribution tasks.
Recent few-shot learning works focus on training a model with prior meta-knowledge to fast adapt to new tasks with unseen classes and samples. However, conventional time-series classification algorithms fail to tackle the few-shot scenario. Existing few-shot learning methods are proposed to tackle image or text data, and most of them are neural-based models that lack interpretability. This paper proposes an interpretable neural-based framework, namely textit{Dual Prototypical Shapelet Networks (DPSN)} for few-shot time-series classification, which not only trains a neural network-based model but also interprets the model from dual granularity: 1) global overview using representative time series samples, and 2) local highlights using discriminative shapelets. In particular, the generated dual prototypical shapelets consist of representative samples that can mostly demonstrate the overall shapes of all samples in the class and discriminative partial-length shapelets that can be used to distinguish different classes. We have derived 18 few-shot TSC datasets from public benchmark datasets and evaluated the proposed method by comparing with baselines. The DPSN framework outperforms state-of-the-art time-series classification methods, especially when training with limited amounts of data. Several case studies have been given to demonstrate the interpret ability of our model.
Few-shot classification is a challenging task which aims to formulate the ability of humans to learn concepts from limited prior data and has drawn considerable attention in machine learning. Recent progress in few-shot classification has featured me ta-learning, in which a parameterized model for a learning algorithm is defined and trained to learn the ability of handling classification tasks on extremely large or infinite episodes representing different classification task, each with a small labeled support set and its corresponding query set. In this work, we advance this few-shot classification paradigm by formulating it as a supervised classification learning problem. We further propose multi-episode and cross-way training techniques, which respectively correspond to the minibatch and pretraining in classification problems. Experimental results on a state-of-the-art few-shot classification method (prototypical networks) demonstrate that both the proposed training strategies can highly accelerate the training process without accuracy loss for varying few-shot classification problems on Omniglot and miniImageNet.
Model-agnostic meta-learning (MAML) is a popular method for few-shot learning but assumes that we have access to the meta-training set. In practice, training on the meta-training set may not always be an option due to data privacy concerns, intellect ual property issues, or merely lack of computing resources. In this paper, we consider the novel problem of repurposing pretrained MAML checkpoints to solve new few-shot classification tasks. Because of the potential distribution mismatch, the original MAML steps may no longer be optimal. Therefore we propose an alternative meta-testing procedure and combine MAML gradient steps with adversarial training and uncertainty-based stepsize adaptation. Our method outperforms vanilla MAML on same-domain and cross-domains benchmarks using both SGD and Adam optimizers and shows improved robustness to the choice of base stepsize.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا