ﻻ يوجد ملخص باللغة العربية
The vicinal risk minimization (VRM) principle, first proposed by citet{vapnik1999nature}, is an empirical risk minimization (ERM) variant that replaces Dirac masses with vicinal functions. Although there is strong numerical evidence showing that VRM outperforms ERM if appropriate vicinal functions are chosen, a comprehensive theoretical understanding of VRM is still lacking. In this paper, we study the generalization bounds for VRM. Our results support Vapniks original arguments and additionally provide deeper insights into VRM. First, we prove that the complexity of function classes convolving with vicinal functions can be controlled by that of the original function classes under the assumption that the function class is composed of Lipschitz-continuous functions. Then, the resulting generalization bounds for VRM suggest that the generalization performance of VRM is also effected by the choice of vicinity function and the quality of function classes. These findings can be used to examine whether the choice of vicinal function is appropriate for the VRM-based learning setting. Finally, we provide a theoretical explanation for existing VRM models, e.g., uniform distribution-based models, Gaussian distribution-based models, and mixup models.
The standard risk minimization paradigm of machine learning is brittle when operating in environments whose test distributions are different from the training distribution due to spurious correlations. Training on data from many environments and find
We analyze the practices of reservoir computing in the framework of statistical learning theory. In particular, we derive finite sample upper bounds for the generalization error committed by specific families of reservoir computing systems when proce
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a
Inferring causal individual treatment effect (ITE) from observational data is a challenging problem whose difficulty is exacerbated by the presence of treatment assignment bias. In this work, we propose a new way to estimate the ITE using the domain
We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization er