ترغب بنشر مسار تعليمي؟ اضغط هنا

Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote

283   0   0.0 ( 0 )
 نشر من قبل Yi-Shan Wu
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev-Cantelli inequality (a.k.a. one-sided Chebyshevs), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain et al., 2015], and, at the same time, it improves on the oracle bound based on second order Markovs inequality introduced by Masegosa et al. [2020]. We also derive the PAC-Bayes-Bennett inequality, which we use for empirical estimation of the oracle bound. The PAC-Bayes-Bennett inequality improves on the PAC-Bayes-Bernstein inequality by Seldin et al. [2012]. We provide an empirical evaluation demonstrating that the new bounds can improve on the work by Masegosa et al. [2020]. Both the parametric form of the Chebyshev-Cantelli inequality and the PAC-Bayes-Bennett inequality may be of independent interest for the study of concentration of measure in other domains.



قيم البحث

اقرأ أيضاً

We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, w hich yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble.
We present a new PAC-Bayesian generalization bound. Standard bounds contain a $sqrt{L_n cdot KL/n}$ complexity term which dominates unless $L_n$, the empirical error of the learning algorithms randomized predictions, vanishes. We manage to replace $L _n$ by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough $n$). Theoretically, unlike existing bounds, our new bound can be expected to converge to $0$ faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {em excess risk/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernsteins but with $X^2$ taken outside its expectation.
We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective. The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PAC-Bayes objectives -- both with uninformed (data-independent) and informed (data-dependent) priors.
The dominant term in PAC-Bayes bounds is often the Kullback--Leibler divergence between the posterior and prior. For so-called linear PAC-Bayes risk bounds based on the empirical risk of a fixed posterior kernel, it is possible to minimize the expect ed value of the bound by choosing the prior to be the expected posterior, which we call the oracle prior on the account that it is distribution dependent. In this work, we show that the bound based on the oracle prior can be suboptimal: In some cases, a stronger bound is obtained by using a data-dependent oracle prior, i.e., a conditional expectation of the posterior, given a subset of the training data that is then excluded from the empirical risk term. While using data to learn a prior is a known heuristic, its essential role in optimal bounds is new. In fact, we show that using data can mean the difference between vacuous and nonvacuous bounds. We apply this new principle in the setting of nonconvex learning, simulating data-dependent oracle priors on MNIST and Fashion MNIST with and without held-out data, and demonstrating new nonvacuous bounds in both cases.
89 - Tianyu Liu , Jie Lu , Zheng Yan 2021
By leveraging experience from previous tasks, meta-learning algorithms can achieve effective fast adaptation ability when encountering new tasks. However it is unclear how the generalization property applies to new tasks. Probably approximately corre ct (PAC) Bayes bound theory provides a theoretical framework to analyze the generalization performance for meta-learning. We derive three novel generalisation error bounds for meta-learning based on PAC-Bayes relative entropy bound. Furthermore, using the empirical risk minimization (ERM) method, a PAC-Bayes bound for meta-learning with data-dependent prior is developed. Experiments illustrate that the proposed three PAC-Bayes bounds for meta-learning guarantee a competitive generalization performance guarantee, and the extended PAC-Bayes bound with data-dependent prior can achieve rapid convergence ability.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا