Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Interpolating Classifiers Make Few Mistakes

71 0 0.0 ( 0 )

Download Cite

Added by Tengyuan Liang

Publication date 2021

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Tengyuan Liang - Benjamin Recht

Machine Learning Machine Learning Numerical Analysis

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper provides elementary analyses of the regret and generalization of minimum-norm interpolating classifiers (MNIC). The MNIC is the function of smallest Reproducing Kernel Hilbert Space norm that perfectly interpolates a label pattern on a finite data set. We derive a mistake bound for MNIC and a regularized variant that holds for all data sets. This bound follows from elementary properties of matrix inverses. Under the assumption that the data is independently and identically distributed, the mistake bound implies that MNIC generalizes at a rate proportional to the norm of the interpolating solution and inversely proportional to the number of data points. This rate matches similar rates derived for margin classifiers and perceptrons. We derive several plausible generative models where the norm of the interpolating classifier is bounded or grows at a rate sublinear in $n$. We also show that as long as the population class conditional distributions are sufficiently separable in total variation, then MNIC generalizes with a fast rate.

rate research

Neural Networks as Functional Classifiers

93 - Barinder Thind , Kevin Multani , Jiguo Cao 2020

In recent years, there has been considerable innovation in the world of predictive methodologies. This is evident by the relative domination of machine learning approaches in various classification competitions. While these algorithms have excelled at multivariate problems, they have remained dormant in the realm of functional data analysis. We extend notable deep learning methodologies to the domain of functional data for the purpose of classification problems. We highlight the effectiveness of our method in a number of classification applications such as classification of spectrographic data. Moreover, we demonstrate the performance of our classifier through simulation studies in which we compare our approach to the functional linear model and other conventional classification methods.

Machine Learning Machine Learning

Identifying Significant Predictive Bias in Classifiers

102 - Zhe Zhang , Daniel B. Neill 2016

We present a novel subset scan method to detect if a probabilistic binary classifier has statistically significant bias -- over or under predicting the risk -- for some subgroup, and identify the characteristics of this subgroup. This form of model checking and goodness-of-fit test provides a way to interpretably detect the presence of classifier bias or regions of poor classifier fit. This allows consideration of not just subgroups of a priori interest or small dimensions, but the space of all possible subgroups of features. To address the difficulty of considering these exponentially many possible subgroups, we use subset scan and parametric bootstrap-based methods. Extending this method, we can penalize the complexity of the detected subgroup and also identify subgroups with high classification errors. We demonstrate these methods and find interesting results on the COMPAS crime recidivism and credit delinquency data.

Machine Learning Machine Learning

$(f,Gamma)$-Divergences: Interpolating between $f$-Divergences and Integral Probability Metrics

66 - Jeremiah Birrell , Paul Dupuis , Markos A. Katsoulakis 2020

We develop a rigorous and general framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs), such as the $1$-Wasserstein distance. We prove under which assumptions these divergences, hereafter referred to as $(f,Gamma)$-divergences, provide a notion of `distance between probability measures and show that they can be expressed as a two-stage mass-redistribution/mass-transport process. The $(f,Gamma)$-divergences inherit features from IPMs, such as the ability to compare distributions which are not absolutely continuous, as well as from $f$-divergences, namely the strict concavity of their variational representations and the ability to control heavy-tailed distributions for particular choices of $f$. When combined, these features establish a divergence with improved properties for estimation, statistical learning, and uncertainty quantification applications. Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions. We also show improved performance and stability over gradient-penalized Wasserstein GAN in image generation.

Machine Learning Machine Learning

Majority Vote of Diverse Classifiers for Late Fusion

448 - Emilie Morvant 2014

In the past few years, a lot of attention has been devoted to multimedia indexing by fusing multimodal informations. Two kinds of fusion schemes are generally considered: The early fusion and the late fusion. We focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program named MinCq coming from the machine learning PAC-Bayesian theory. MinCq looks for the weighted combination, over a set of real-valued functions seen as voters, leading to the lowest misclassification rate, while maximizing the voters diversity. We propose an extension of MinCq tailored to multimedia indexing. Our method is based on an order-preserving pairwise loss adapted to ranking that allows us to improve Mean Averaged Precision measure while taking into account the diversity of the voters that we want to fuse. We provide evidence that this method is naturally adapted to late fusion procedures and confirm the good behavior of our approach on the challenging PASCAL VOC07 benchmark.

Machine Learning Machine Learning Multimedia

For interpolating kernel machines, minimizing the norm of the ERM solution minimizes stability

77 - Akshay Rangamani , Lorenzo Rosasco , Tomaso Poggio 2020

We study the average $mbox{CV}_{loo}$ stability of kernel ridge-less regression and derive corresponding risk bounds. We show that the interpolating solution with minimum norm minimizes a bound on $mbox{CV}_{loo}$ stability, which in turn is controlled by the condition number of the empirical kernel matrix. The latter can be characterized in the asymptotic regime where both the dimension and cardinality of the data go to infinity. Under the assumption of random kernel matrices, the corresponding test error should be expected to follow a double descent curve.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Interpolating Classifiers Make Few Mistakes

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions