ﻻ يوجد ملخص باللغة العربية
Model-Agnostic Meta-Learning (MAML) has become increasingly popular for training models that can quickly adapt to new tasks via one or few stochastic gradient descent steps. However, the MAML objective is significantly more difficult to optimize compared to standard Empirical Risk Minimization (ERM), and little is understood about how much MAML improves over ERM in terms of the fast adaptability of their solutions in various scenarios. We analytically address this issue in a linear regression setting consisting of a mixture of easy and hard tasks, where hardness is related to the condition number of the tasks loss function. Specifically, we prove that in order for MAML to achieve substantial gain over ERM, (i) there must be some discrepancy in hardness among the tasks, and (ii) the optimal solutions of the hard tasks must be closely packed with the center far from the center of the easy tasks optimal solutions. We also give numerical and analytical results suggesting that these insights also apply to two-layer neural networks. Finally, we provide few-shot image classification experiments that support our insights for when MAML should be used and emphasize the importance of training MAML on hard tasks in practice.
Unsupervised domain adaptation (UDA) aims to train a target classifier with labeled samples from the source domain and unlabeled samples from the target domain. Classical UDA learning bounds show that target risk is upper bounded by three terms: sour
It is observed in the literature that data augmentation can significantly mitigate membership inference (MI) attack. However, in this work, we challenge this observation by proposing new MI attacks to utilize the information of augmented data. MI att
This paper studies the novel concept of weight correlation in deep neural networks and discusses its impact on the networks generalisation ability. For fully-connected layers, the weight correlation is defined as the average cosine similarity between
One of the major concerns for neural network training is that the non-convexity of the associated loss functions may cause bad landscape. The recent success of neural networks suggests that their loss landscape is not too bad, but what specific resul
There is a longstanding discrepancy between the observed Galactic classical nova rate of $sim 10$ yr$^{-1}$ and the predicted rate from Galactic models of $sim 30$--50 yr$^{-1}$. One explanation for this discrepancy is that many novae are hidden by i