Interpolating Classifiers Make Few Mistakes

71 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tengyuan Liang

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Tengyuan Liang - Benjamin Recht

التعلم الالي التعلم الآلي التحليل العددي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper provides elementary analyses of the regret and generalization of minimum-norm interpolating classifiers (MNIC). The MNIC is the function of smallest Reproducing Kernel Hilbert Space norm that perfectly interpolates a label pattern on a finite data set. We derive a mistake bound for MNIC and a regularized variant that holds for all data sets. This bound follows from elementary properties of matrix inverses. Under the assumption that the data is independently and identically distributed, the mistake bound implies that MNIC generalizes at a rate proportional to the norm of the interpolating solution and inversely proportional to the number of data points. This rate matches similar rates derived for margin classifiers and perceptrons. We derive several plausible generative models where the norm of the interpolating classifier is bounded or grows at a rate sublinear in $n$. We also show that as long as the population class conditional distributions are sufficiently separable in total variation, then MNIC generalizes with a fast rate.

قيم البحث

93 - Barinder Thind , Kevin Multani , Jiguo Cao 2020

In recent years, there has been considerable innovation in the world of predictive methodologies. This is evident by the relative domination of machine learning approaches in various classification competitions. While these algorithms have excelled a t multivariate problems, they have remained dormant in the realm of functional data analysis. We extend notable deep learning methodologies to the domain of functional data for the purpose of classification problems. We highlight the effectiveness of our method in a number of classification applications such as classification of spectrographic data. Moreover, we demonstrate the performance of our classifier through simulation studies in which we compare our approach to the functional linear model and other conventional classification methods.

التعلم الالي التعلم الآلي

Identifying Significant Predictive Bias in Classifiers

102 - Zhe Zhang , Daniel B. Neill 2016

We present a novel subset scan method to detect if a probabilistic binary classifier has statistically significant bias -- over or under predicting the risk -- for some subgroup, and identify the characteristics of this subgroup. This form of model c hecking and goodness-of-fit test provides a way to interpretably detect the presence of classifier bias or regions of poor classifier fit. This allows consideration of not just subgroups of a priori interest or small dimensions, but the space of all possible subgroups of features. To address the difficulty of considering these exponentially many possible subgroups, we use subset scan and parametric bootstrap-based methods. Extending this method, we can penalize the complexity of the detected subgroup and also identify subgroups with high classification errors. We demonstrate these methods and find interesting results on the COMPAS crime recidivism and credit delinquency data.

التعلم الالي التعلم الآلي

$(f,Gamma)$-Divergences: Interpolating between $f$-Divergences and Integral Probability Metrics

66 - Jeremiah Birrell , Paul Dupuis , Markos A. Katsoulakis 2020

We develop a rigorous and general framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs), such as the $1$-Wasserstein distance. We prove under which assumptions these div ergences, hereafter referred to as $(f,Gamma)$-divergences, provide a notion of `distance between probability measures and show that they can be expressed as a two-stage mass-redistribution/mass-transport process. The $(f,Gamma)$-divergences inherit features from IPMs, such as the ability to compare distributions which are not absolutely continuous, as well as from $f$-divergences, namely the strict concavity of their variational representations and the ability to control heavy-tailed distributions for particular choices of $f$. When combined, these features establish a divergence with improved properties for estimation, statistical learning, and uncertainty quantification applications. Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions. We also show improved performance and stability over gradient-penalized Wasserstein GAN in image generation.

التعلم الالي التعلم الآلي

Majority Vote of Diverse Classifiers for Late Fusion

260 - Emilie Morvant 2014

In the past few years, a lot of attention has been devoted to multimedia indexing by fusing multimodal informations. Two kinds of fusion schemes are generally considered: The early fusion and the late fusion. We focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program named MinCq coming from the machine learning PAC-Bayesian theory. MinCq looks for the weighted combination, over a set of real-valued functions seen as voters, leading to the lowest misclassification rate, while maximizing the voters diversity. We propose an extension of MinCq tailored to multimedia indexing. Our method is based on an order-preserving pairwise loss adapted to ranking that allows us to improve Mean Averaged Precision measure while taking into account the diversity of the voters that we want to fuse. We provide evidence that this method is naturally adapted to late fusion procedures and confirm the good behavior of our approach on the challenging PASCAL VOC07 benchmark.

التعلم الالي التعلم الآلي الوسائط المتعددة

For interpolating kernel machines, minimizing the norm of the ERM solution minimizes stability

77 - Akshay Rangamani , Lorenzo Rosasco , Tomaso Poggio 2020

We study the average $mbox{CV}_{loo}$ stability of kernel ridge-less regression and derive corresponding risk bounds. We show that the interpolating solution with minimum norm minimizes a bound on $mbox{CV}_{loo}$ stability, which in turn is controll ed by the condition number of the empirical kernel matrix. The latter can be characterized in the asymptotic regime where both the dimension and cardinality of the data go to infinity. Under the assumption of random kernel matrices, the corresponding test error should be expected to follow a double descent curve.

التعلم الالي التعلم الآلي