Do you want to publish a course? Click here

A Generalization Theory based on Independent and Task-Identically Distributed Assumption

144   0   0.0 ( 0 )
 Added by Guanhua Zheng
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Existing generalization theories analyze the generalization performance mainly based on the model complexity and training process. The ignorance of the task properties, which results from the widely used IID assumption, makes these theories fail to interpret many generalization phenomena or guide practical learning tasks. In this paper, we propose a new Independent and Task-Identically Distributed (ITID) assumption, to consider the task properties into the data generating process. The derived generalization bound based on the ITID assumption identifies the significance of hypothesis invariance in guaranteeing generalization performance. Based on the new bound, we introduce a practical invariance enhancement algorithm from the perspective of modifying data distributions. Finally, we verify the algorithm and theorems in the context of image classification task on both toy and real-world datasets. The experimental results demonstrate the reasonableness of the ITID assumption and the effectiveness of new generalization theory in improving practical generalization performance.

rate research

Read More

Meta-learning has proven to be a powerful paradigm for transferring the knowledge from previous tasks to facilitate the learning of a novel task. Current dominant algorithms train a well-generalized model initialization which is adapted to each task via the support set. The crux lies in optimizing the generalization capability of the initialization, which is measured by the performance of the adapted model on the query set of each task. Unfortunately, this generalization measure, evidenced by empirical results, pushes the initialization to overfit the meta-training tasks, which significantly impairs the generalization and adaptation to novel tasks. To address this issue, we actively augment a meta-training task with more data when evaluating the generalization. Concretely, we propose two task augmentation methods, including MetaMix and Channel Shuffle. MetaMix linearly combines features and labels of samples from both the support and query sets. For each class of samples, Channel Shuffle randomly replaces a subset of their channels with the corresponding ones from a different class. Theoretical studies show how task augmentation improves the generalization of meta-learning. Moreover, both MetaMix and Channel Shuffle outperform state-of-the-art results by a large margin across many datasets and are compatible with existing meta-learning algorithms.
In this paper, joint limit distributions of maxima and minima on independent and non-identically distributed bivariate Gaussian triangular arrays is derived as the correlation coefficient of $i$th vector of given $n$th row is the function of $i/n$. Furthermore, second-order expansions of joint distributions of maxima and minima are established if the correlation function satisfies some regular conditions.
We aim to estimate the probability that the sum of nonnegative independent and identically distributed random variables falls below a given threshold, i.e., $mathbb{P}(sum_{i=1}^{N}{X_i} leq gamma)$, via importance sampling (IS). We are particularly interested in the rare event regime when $N$ is large and/or $gamma$ is small. The exponential twisting is a popular technique that, in most of the cases, compares favorably to existing estimators. However, it has several limitations: i) it assumes the knowledge of the moment generating function of $X_i$ and ii) sampling under the new measure is not straightforward and might be expensive. The aim of this work is to propose an alternative change of measure that yields, in the rare event regime corresponding to large $N$ and/or small $gamma$, at least the same performance as the exponential twisting technique and, at the same time, does not introduce serious limitations. For distributions whose probability density functions (PDFs) are $mathcal{O}(x^{d})$, as $x rightarrow 0$ and $d>-1$, we prove that the Gamma IS PDF with appropriately chosen parameters retrieves asymptotically, in the rare event regime, the same performance of the estimator based on the use of the exponential twisting technique. Moreover, in the Log-normal setting, where the PDF at zero vanishes faster than any polynomial, we numerically show that a Gamma IS PDF with optimized parameters clearly outperforms the exponential twisting change of measure. Numerical experiments validate the efficiency of the proposed estimator in delivering a highly accurate estimate in the regime of large $N$ and/or small $gamma$.
Generalization is a central problem in Machine Learning. Most prediction methods require careful calibration of hyperparameters carried out on a hold-out textit{validation} dataset to achieve generalization. The main goal of this paper is to present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization. We illustrate the pertinence of this new framework in the regression problem. The main advantages of this new approach are: (i) it can simultaneously train the model and perform regularization in a single run of a gradient-based optimizer on all available data without any previous hyperparameter tuning; (ii) this framework can tackle several additional objectives simultaneously (correlation, sparsity,...) $via$ the introduction of regularization parameters. Noticeably, our approach transforms hyperparameter tuning as well as feature selection (a combinatorial discrete optimization problem) into a continuous optimization problem that is solvable via classical gradient-based methods ; (iii) the computational complexity of our methods is $O(npK)$ where $n,p,K$ denote respectively the number of observations, features and iterations of the gradient descent algorithm. We observe in our experiments a significantly smaller runtime for our methods as compared to benchmark methods for equivalent prediction score. Our procedures are implemented in PyTorch (code is available for replication).
Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, their performance on early tasks tend to drop significantly. This problem is often referred to as catastrophic forgetting, a key challenge in continual learning of neural networks. The regularization-based approach is one of the primary classes of methods to alleviate catastrophic forgetting. In this paper, we provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. This viewpoint leads to a unified framework that can be instantiated to derive many existing algorithms such as Elastic Weight Consolidation and Kronecker factored Laplace approximation. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning. Our theoretical results indicate the importance of accurate approximation of the Hessian matrix. The experimental results on several benchmarks provide empirical validation of our theoretical findings.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا