ترغب بنشر مسار تعليمي؟ اضغط هنا

How Does the Task Landscape Affect MAML Performance?

121   0   0.0 ( 0 )
 نشر من قبل Liam Collins
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Model-Agnostic Meta-Learning (MAML) has become increasingly popular for training models that can quickly adapt to new tasks via one or few stochastic gradient descent steps. However, the MAML objective is significantly more difficult to optimize compared to standard Empirical Risk Minimization (ERM), and little is understood about how much MAML improves over ERM in terms of the fast adaptability of their solutions in various scenarios. We analytically address this issue in a linear regression setting consisting of a mixture of easy and hard tasks, where hardness is related to the condition number of the tasks loss function. Specifically, we prove that in order for MAML to achieve substantial gain over ERM, (i) there must be some discrepancy in hardness among the tasks, and (ii) the optimal solutions of the hard tasks must be closely packed with the center far from the center of the easy tasks optimal solutions. We also give numerical and analytical results suggesting that these insights also apply to two-layer neural networks. Finally, we provide few-shot image classification experiments that support our insights for when MAML should be used and emphasize the importance of training MAML on hard tasks in practice.

قيم البحث

اقرأ أيضاً

138 - Li Zhong , Zhen Fang , Feng Liu 2020
Unsupervised domain adaptation (UDA) aims to train a target classifier with labeled samples from the source domain and unlabeled samples from the target domain. Classical UDA learning bounds show that target risk is upper bounded by three terms: sour ce risk, distribution discrepancy, and combined risk. Based on the assumption that the combined risk is a small fixed value, methods based on this bound train a target classifier by only minimizing estimators of the source risk and the distribution discrepancy. However, the combined risk may increase when minimizing both estimators, which makes the target risk uncontrollable. Hence the target classifier cannot achieve ideal performance if we fail to control the combined risk. To control the combined risk, the key challenge takes root in the unavailability of the labeled samples in the target domain. To address this key challenge, we propose a method named E-MixNet. E-MixNet employs enhanced mixup, a generic vicinal distribution, on the labeled source samples and pseudo-labeled target samples to calculate a proxy of the combined risk. Experiments show that the proxy can effectively curb the increase of the combined risk when minimizing the source risk and distribution discrepancy. Furthermore, we show that if the proxy of the combined risk is added into loss functions of four representative UDA methods, their performance is also improved.
291 - Da Yu , Huishuai Zhang , Wei Chen 2020
It is observed in the literature that data augmentation can significantly mitigate membership inference (MI) attack. However, in this work, we challenge this observation by proposing new MI attacks to utilize the information of augmented data. MI att ack is widely used to measure the models information leakage of the training set. We establish the optimal membership inference when the model is trained with augmented data, which inspires us to formulate the MI attack as a set classification problem, i.e., classifying a set of augmented instances instead of a single data point, and design input permutation invariant features. Empirically, we demonstrate that the proposed approach universally outperforms original methods when the model is trained with data augmentation. Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation. Notably, we achieve a 70.1% MI attack success rate on CIFAR10 against a wide residual network while the previous best approach only attains 61.9%. This suggests the privacy risk of models trained with data augmentation could be largely underestimated.
This paper studies the novel concept of weight correlation in deep neural networks and discusses its impact on the networks generalisation ability. For fully-connected layers, the weight correlation is defined as the average cosine similarity between weight vectors of neurons, and for convolutional layers, the weight correlation is defined as the cosine similarity between filter matrices. Theoretically, we show that, weight correlation can, and should, be incorporated into the PAC Bayesian framework for the generalisation of neural networks, and the resulting generalisation bound is monotonic with respect to the weight correlation. We formulate a new complexity measure, which lifts the PAC Bayes measure with weight correlation, and experimentally confirm that it is able to rank the generalisation errors of a set of networks more precisely than existing measures. More importantly, we develop a new regulariser for training, and provide extensive experiments that show that the generalisation error can be greatly reduced with our novel approach.
334 - Ruoyu Sun , Dawei Li , Shiyu Liang 2020
One of the major concerns for neural network training is that the non-convexity of the associated loss functions may cause bad landscape. The recent success of neural networks suggests that their loss landscape is not too bad, but what specific resul ts do we know about the landscape? In this article, we review recent findings and results on the global landscape of neural networks. First, we point out that wide neural nets may have sub-optimal local minima under certain assumptions. Second, we discuss a few rigorous results on the geometric properties of wide networks such as no bad basin, and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity. Third, we discuss visualization and empirical explorations of the landscape for practical neural nets. Finally, we briefly discuss some convergence results and their relation to landscape results.
There is a longstanding discrepancy between the observed Galactic classical nova rate of $sim 10$ yr$^{-1}$ and the predicted rate from Galactic models of $sim 30$--50 yr$^{-1}$. One explanation for this discrepancy is that many novae are hidden by i nterstellar extinction, but the degree to which dust can obscure novae is poorly constrained. We use newly available all-sky three-dimensional dust maps to compare the brightness and spatial distribution of known novae to that predicted from relatively simple models in which novae trace Galactic stellar mass. We find that only half ($sim 48$%) of novae are expected to be easily detectable ($g lesssim 15$) with current all-sky optical surveys such as the All-Sky Automated Survey for Supernovae (ASAS-SN). This fraction is much lower than previously estimated, showing that dust does substantially affect nova detection in the optical. By comparing complementary survey results from ASAS-SN, OGLE-IV, and the Palomar Gattini IR-survey in the context of our modeling, we find a tentative Galactic nova rate of $sim 40$ yr$^{-1}$, though this could decrease to as low as $sim 30$ yr$^{-1}$ depending on the assumed distribution of novae within the Galaxy. These preliminary estimates will be improved in future work through more sophisticated modeling of nova detection in ASAS-SN and other surveys.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا