How to Train Your MAML to Excel in Few-Shot Classification

107 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wei-Lun Chao

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Han-Jia Ye - Wei-Lun Chao

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Model-agnostic meta-learning (MAML) is arguably the most popular meta-learning algorithm nowadays, given its flexibility to incorporate various model architectures and to be applied to different problems. Nevertheless, its performance on few-shot classification is far behind many recent algorithms dedicated to the problem. In this paper, we point out several key facets of how to train MAML to excel in few-shot classification. First, we find that a large number of gradient steps are needed for the inner loop update, which contradicts the common usage of MAML for few-shot classification. Second, we find that MAML is sensitive to the permutation of class assignments in meta-testing: for a few-shot task of $N$ classes, there are exponentially many ways to assign the learned initialization of the $N$-way classifier to the $N$ classes, leading to an unavoidably huge variance. Third, we investigate several ways for permutation invariance and find that learning a shared classifier initialization for all the classes performs the best. On benchmark datasets such as MiniImageNet and TieredImageNet, our approach, which we name UNICORN-MAML, performs on a par with or even outperforms state-of-the-art algorithms, while keeping the simplicity of MAML without adding any extra sub-networks.

قيم البحث

اقرأ أيضاً

How to Train Your Differentiable Filter

133 - Alina Kloss , Georg Martius , Jeannette Bohg 2020

In many robotic applications, it is crucial to maintain a belief about the state of a system, which serves as input for planning and decision making and provides feedback during task execution. Bayesian Filtering algorithms address this state estimat ion problem, but they require models of process dynamics and sensory observations and the respective noise characteristics of these models. Recently, multiple works have demonstrated that these models can be learned by end-to-end training through differentiab

التعلم الآلي علم الروبوتات

How to Train Your Energy-Based Models

328 - Yang Song , Diederik P. Kingma 2021

Energy-Based Models (EBMs), also known as non-normalized probabilistic models, specify probability density or mass functions up to an unknown normalizing constant. Unlike most other probabilistic models, EBMs do not place a restriction on the tractab ility of the normalizing constant, thus are more flexible to parameterize and can model a more expressive family of probability distributions. However, the unknown normalizing constant of EBMs makes training particularly difficult. Our goal is to provide a friendly introduction to modern approaches for EBM training. We start by explaining maximum likelihood training with Markov chain Monte Carlo (MCMC), and proceed to elaborate on MCMC-free approaches, including Score Matching (SM) and Noise Constrastive Estimation (NCE). We highlight theoretical connections among these three approaches, and end with a brief survey on alternative training methods, which are still under active research. Our tutorial is targeted at an audience with basic understanding of generative models who want to apply EBMs or start a research project in this direction.

التعلم الآلي التعلم الالي

How to Train Your Agent to Read and Write

216 - Li Liu , Mengge He , Guanghui Xu 2021

Reading and writing research papers is one of the most privileged abilities that a qualified researcher should master. However, it is difficult for new researchers (eg{students}) to fully {grasp} this ability. It would be fascinating if we could trai n an intelligent agent to help people read and summarize papers, and perhaps even discover and exploit the potential knowledge clues to write novel papers. Although there have been existing works focusing on summarizing (emph{i.e.}, reading) the knowledge in a given text or generating (emph{i.e.}, writing) a text based on the given knowledge, the ability of simultaneously reading and writing is still under development. Typically, this requires an agent to fully understand the knowledge from the given text materials and generate correct and fluent novel paragraphs, which is very challenging in practice. In this paper, we propose a Deep ReAder-Writer (DRAW) network, which consists of a textit{Reader} that can extract knowledge graphs (KGs) from input paragraphs and discover potential knowledge, a graph-to-text textit{Writer} that generates a novel paragraph, and a textit{Reviewer} that reviews the generated paragraph from three different aspects. Extensive experiments show that our DRAW network outperforms considered baselines and several state-of-the-art methods on AGENDA and M-AGENDA datasets. Our code and supplementary are released at https://github.com/menggehe/DRAW.

الحساب واللغة الذكاء الاصطناعي

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

158 - Andreas Steiner , Alexander Kolesnikov , Xiaohua Zhai 2021

Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation. In comparison to convolutional neural networks , the Vision Transformers weaker inductive bias is generally found to cause an increased reliance on model regularization or data augmentation (``AugReg for short) when training on smaller training datasets. We conduct a systematic empirical study in order to better understand the interplay between the amount of training data, AugReg, model size and compute budget. As one result of this study we find that the combination of increased compute and AugReg can yield models with the same performance as models trained on an order of magnitude more training data: we train ViT models of various sizes on the public ImageNet-21k dataset which either match or outperform their counterparts trained on the larger, but not publicly available JFT-300M dataset.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

How to Train your DNN: The Network Operator Edition

90 - Michael Alan Chang , Domenic Bottini , Lisa Jian 2020

Deep Neural Nets have hit quite a crest, But physical networks are where they must rest, And here we put them all to the test, To see which network optimization is best.

بنية الشبكات والإنترنت النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي