ترغب بنشر مسار تعليمي؟ اضغط هنا

Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds and Comparison with KL-divergence based Optimal Posteriors and Cross-Validation Procedure

73   0   0.0 ( 0 )
 نشر من قبل Puja Sahu
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We investigate optimal posteriors for recently introduced cite{begin2016pac} chi-squared divergence based PAC-Bayesian bounds in terms of nature of their distribution, scalability of computations, and test set performance. For a finite classifier set, we deduce bounds for three distance functions: KL-divergence, linear and squared distances. Optimal posterior weights are proportional to deviations of empirical risks, usually with subset support. For uniform prior, it is sufficient to search among posteriors on classifier subsets ordered by these risks. We show the bound minimization for linear distance as a convex program and obtain a closed-form expression for its optimal posterior. Whereas that for squared distance is a quasi-convex program under a specific condition, and the one for KL-divergence is non-convex optimization (a difference of convex functions). To compute such optimal posteriors, we derive fast converging fixed point (FP) equations. We apply these approaches to a finite set of SVM regularization parameter values to yield stochastic SVMs with tight bounds. We perform a comprehensive performance comparison between our optimal posteriors and known KL-divergence based posteriors on a variety of UCI datasets with varying ranges and variances in risk values, etc. Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors. Our study highlights the impact of divergence function on the performance of PAC-Bayesian classifiers. We compare our stochastic classifiers with cross-validation based deterministic classifier. The latter has better test errors, but ours is more sample robust, has quantifiable generalization guarantees, and is computationally much faster.



قيم البحث

اقرأ أيضاً

Datasets displaying temporal dependencies abound in science and engineering applications, with Markov models representing a simplified and popular view of the temporal dependence structure. In this paper, we consider Bayesian settings that place prio r distributions over the parameters of the transition kernel of a Markov model, and seeks to characterize the resulting, typically intractable, posterior distributions. We present a PAC-Bayesian analysis of variational Bayes (VB) approximations to tempered Bayesian posterior distributions, bounding the model risk of the VB approximations. Tempered posteriors are known to be robust to model misspecification, and their variational approximations do not suffer the usual problems of over confident approximations. Our results tie the risk bounds to the mixing and ergodic properties of the Markov data generating model. We illustrate the PAC-Bayes bounds through a number of example Markov models, and also consider the situation where the Markov model is misspecified.
PAC-Bayesian set up involves a stochastic classifier characterized by a posterior distribution on a classifier set, offers a high probability bound on its averaged true risk and is robust to the training sample used. For a given posterior, this bound captures the trade off between averaged empirical risk and KL-divergence based model complexity term. Our goal is to identify an optimal posterior with the least PAC-Bayesian bound. We consider a finite classifier set and 5 distance functions: KL-divergence, its Pinskers and a sixth degree polynomial approximations; linear and squared distances. Linear distance based model results in a convex optimization problem. We obtain closed form expression for its optimal posterior. For uniform prior, this posterior has full support with weights negative-exponentially proportional to number of misclassifications. Squared distance and Pinskers approximation bounds are possibly quasi-convex and are observed to have single local minimum. We derive fixed point equations (FPEs) using partial KKT system with strict positivity constraints. This obviates the combinatorial search for subset support of the optimal posterior. For uniform prior, exponential search on a full-dimensional simplex can be limited to an ordered subset of classifiers with increasing empirical risk values. These FPEs converge rapidly to a stationary point, even for a large classifier set when a solver fails. We apply these approaches to SVMs generated using a finite set of SVM regularization parameter values on 9 UCI datasets. These posteriors yield stochastic SVM classifiers with tight bounds. KL-divergence based bound is the tightest, but is computationally expensive due to non-convexity and multiple calls to a root finding algorithm. Optimal posteriors for all 5 distance functions have lowest 10% test error values on most datasets, with linear distance being the easiest to obtain.
We define a new method to estimate centroid for text classification based on the symmetric KL-divergence between the distribution of words in training documents and their class centroids. Experiments on several standard data sets indicate that the ne w method achieves substantial improvements over the traditional classifiers.
Online machine learning (OML) algorithms do not need any training phase and can be deployed directly in an unknown environment. OML includes multi-armed bandit (MAB) algorithms that can identify the best arm among several arms by achieving a balance between exploration of all arms and exploitation of optimal arm. The Kullback-Leibler divergence based upper confidence bound (KLUCB) is the state-of-the-art MAB algorithm that optimizes exploration-exploitation trade-off but it is complex due to underlining optimization routine. This limits its usefulness for robotics and radio applications which demand integration of KLUCB with the PHY on the system on chip (SoC). In this paper, we efficiently map the KLUCB algorithm on SoC by realizing optimization routine via alternative synthesizable computation without compromising on the performance. The proposed architecture is dynamically reconfigurable such that the number of arms, as well as type of algorithm, can be changed on-the-fly. Specifically, after initial learning, on-the-fly switch to light-weight UCB offers around 10-factor improvement in latency and throughput. Since learning duration depends on the unknown arm statistics, we offer intelligence embedded in architecture to decide the switching instant. We validate the functional correctness and usefulness of the proposed architecture via a realistic wireless application and detailed complexity analysis demonstrates its feasibility in realizing intelligent radios.
161 - Salim Bouzebda 2011
We introduce a new test procedure of independence in the framework of parametric copulas with unknown marginals. The method is based essentially on the dual representation of $chi^2$-divergence on signed finite measures. The asymptotic properties of the proposed estimate and the test statistic are studied under the null and alternative hypotheses, with simple and standard limit distributions both when the parameter is an interior point or not.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا