No Arabic abstract
By transferring knowledge learned from seen/previous tasks, meta learning aims to generalize well to unseen/future tasks. Existing meta-learning approaches have shown promising empirical performance on various multiclass classification problems, but few provide theoretical analysis on the classifiers generalization ability on future tasks. In this paper, under the assumption that all classification tasks are sampled from the same meta-distribution, we leverage margin theory and statistical learning theory to establish three margin-based transfer bounds for meta-learning based multiclass classification (MLMC). These bounds reveal that the expected error of a given classification algorithm for a future task can be estimated with the average empirical error on a finite number of previous tasks, uniformly over a class of preprocessing feature maps/deep neural networks (i.e. deep feature embeddings). To validate these bounds, instead of the commonly-used cross-entropy loss, a multi-margin loss is employed to train a number of representative MLMC models. Experiments on three benchmarks show that these margin-based models still achieve competitive performance, validating the practical value of our margin-based theoretical analysis.
Gradient-based meta-learning techniques are both widely applicable and proficient at solving challenging few-shot learning and fast adaptation problems. However, they have practical difficulties when operating on high-dimensional parameter spaces in extreme low-data regimes. We show that it is possible to bypass these limitations by learning a data-dependent latent generative representation of model parameters, and performing gradient-based meta-learning in this low-dimensional latent space. The resulting approach, latent embedding optimization (LEO), decouples the gradient-based adaptation procedure from the underlying high-dimensional space of model parameters. Our evaluation shows that LEO can achieve state-of-the-art performance on the competitive miniImageNet and tieredImageNet few-shot classification tasks. Further analysis indicates LEO is able to capture uncertainty in the data, and can perform adaptation more effectively by optimizing in latent space.
We study the problem of {em properly} learning large margin halfspaces in the agnostic PAC model. In more detail, we study the complexity of properly learning $d$-dimensional halfspaces on the unit ball within misclassification error $alpha cdot mathrm{OPT}_{gamma} + epsilon$, where $mathrm{OPT}_{gamma}$ is the optimal $gamma$-margin error rate and $alpha geq 1$ is the approximation ratio. We give learning algorithms and computational hardness results for this problem, for all values of the approximation ratio $alpha geq 1$, that are nearly-matching for a range of parameters. Specifically, for the natural setting that $alpha$ is any constant bigger than one, we provide an essentially tight complexity characterization. On the positive side, we give an $alpha = 1.01$-approximate proper learner that uses $O(1/(epsilon^2gamma^2))$ samples (which is optimal) and runs in time $mathrm{poly}(d/epsilon) cdot 2^{tilde{O}(1/gamma^2)}$. On the negative side, we show that {em any} constant factor approximate proper learner has runtime $mathrm{poly}(d/epsilon) cdot 2^{(1/gamma)^{2-o(1)}}$, assuming the Exponential Time Hypothesis.
By leveraging experience from previous tasks, meta-learning algorithms can achieve effective fast adaptation ability when encountering new tasks. However it is unclear how the generalization property applies to new tasks. Probably approximately correct (PAC) Bayes bound theory provides a theoretical framework to analyze the generalization performance for meta-learning. We derive three novel generalisation error bounds for meta-learning based on PAC-Bayes relative entropy bound. Furthermore, using the empirical risk minimization (ERM) method, a PAC-Bayes bound for meta-learning with data-dependent prior is developed. Experiments illustrate that the proposed three PAC-Bayes bounds for meta-learning guarantee a competitive generalization performance guarantee, and the extended PAC-Bayes bound with data-dependent prior can achieve rapid convergence ability.
Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks. Recent progress in model-based RL allows agents to be much more data-efficient, as it enables them to learn behaviors of visual environments in imagination by leveraging an internal World Model of the environment. Improved sample efficiency can also be achieved by reusing knowledge from previously learned tasks, but transfer learning is still a challenging topic in RL. Parameter-based transfer learning is generally done using an all-or-nothing approach, where the networks parameters are either fully transferred or randomly initialized. In this work we present a simple alternative approach: fractional transfer learning. The idea is to transfer fractions of knowledge, opposed to discarding potentially useful knowledge as is commonly done with random initialization. Using the World Model-based Dreamer algorithm, we identify which type of components this approach is applicable to, and perform experiments in a new multi-source transfer learning setting. The results show that fractional transfer learning often leads to substantially improved performance and faster learning compared to learning from scratch and random initialization.
We present a series of new and more favorable margin-based learning guarantees that depend on the empirical margin loss of a predictor. We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity or the empirical $ell_infty$ covering number of the hypothesis set used. Furthermore, using our relative deviation margin bounds, we derive distribution-dependent generalization bounds for unbounded loss functions under the assumption of a finite moment. We also briefly highlight several applications of these bounds and discuss their connection with existing results.