Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Robust Validation: Confident Predictions Even When Distributions Shift

565 0 0.0 ( 0 )

Download Cite

Added by Suyash Gupta

Publication date 2020

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Maxime Cauchois - Suyash Gupta - Alnur Ali

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy---coming from robust statistics and optimization---is thus to build a model robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population. The method, based on conformal inference, achieves (nearly) valid coverage in finite samples, under only the condition that the training data be exchangeable. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it; we develop estimators and prove their consistency for protection and validity of uncertainty estimates under shifts. By experimenting on several large-scale benchmark datasets, including Recht et al.s CIFAR-v4 and ImageNet-V2 datasets, we provide complementary empirical results that highlight the importance of robust predictive validity.

rate research

Interpolation can hurt robust generalization even when there is no noise

80 - Konstantin Donhauser , Alexandru c{T}ifrea , Michael Aerni 2021

Numerous recent works show that overparameterization implicitly reduces variance for min-norm interpolators and max-margin classifiers. These findings suggest that ridge regularization has vanishing benefits in high dimensions. We challenge this narrative by showing that, even in the absence of noise, avoiding interpolation through ridge regularization can significantly improve generalization. We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.

Machine Learning Machine Learning

Counterfactual Predictions under Runtime Confounding

65 - Amanda Coston , Edward H. Kennedy , Alexandra Chouldechova 2020

Algorithms are commonly used to predict outcomes under a particular decision or intervention, such as predicting whether an offender will succeed on parole if placed under minimal supervision. Generally, to learn such counterfactual prediction models from observational data on historical decisions and corresponding outcomes, one must measure all factors that jointly affect the outcomes and the decision taken. Motivated by decision support applications, we study the counterfactual prediction task in the setting where all relevant factors are captured in the historical data, but it is either undesirable or impermissible to use some such factors in the prediction model. We refer to this setting as runtime confounding. We propose a doubly-robust procedure for learning counterfactual prediction models in this setting. Our theoretical analysis and experimental results suggest that our method often outperforms competing approaches. We also present a validation procedure for evaluating the performance of counterfactual prediction methods.

Machine Learning Machine Learning Methodology

Thresholded Adaptive Validation: Tuning the Graphical Lasso for Graph Recovery

104 - Mike Laszkiewicz , Asja Fischer , Johannes Lederer 2020

Many Machine Learning algorithms are formulated as regularized optimization problems, but their performance hinges on a regularization parameter that needs to be calibrated to each application at hand. In this paper, we propose a general calibration scheme for regularized optimization problems and apply it to the graphical lasso, which is a method for Gaussian graphical modeling. The scheme is equipped with theoretical guarantees and motivates a thresholding pipeline that can improve graph recovery. Moreover, requiring at most one line search over the regularization path, the calibration scheme is computationally more efficient than competing schemes that are based on resampling. Finally, we show in simulations that our approach can improve on the graph recovery of other approaches considerably.

Machine Learning Machine Learning Methodology

Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression

132 - William T. Stephenson , Zachary Frangella , Madeleine Udell andn Tamara Broderick 2021

Models like LASSO and ridge regression are extensively used in practice due to their interpretability, ease of use, and strong theoretical guarantees. Cross-validation (CV) is widely used for hyperparameter tuning in these models, but do practical optimization methods minimize the true out-of-sample loss? A recent line of research promises to show that the optimum of the CV loss matches the optimum of the out-of-sample loss (possibly after simple corrections). It remains to show how tractable it is to minimize the CV loss. In the present paper, we show that, in the case of ridge regression, the CV loss may fail to be quasiconvex and thus may have multiple local optima. We can guarantee that the CV loss is quasiconvex in at least one case: when the spectrum of the covariate matrix is nearly flat and the noise in the observed responses is not too high. More generally, we show that quasiconvexity status is independent of many properties of the observed data (response norm, covariate-matrix right singular vectors and singular-value scaling) and has a complex dependence on the few that remain. We empirically confirm our theory using simulated experiments.

Machine Learning Machine Learning Methodology

A partition-based similarity for classification distributions

90 - Hayden S. Helm , Ronak D. Mehta , Brandon Duderstadt 2020

Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners. In particular, we propose a novel similarity on classification distributions, dubbed task similarity, that quantifies how an optimally-transformed optimal representation for a source distribution performs when applied to inference related to a target distribution. The definition of task similarity allows for natural definitions of adversarial and orthogonal distributions. We highlight limiting properties of representations induced by (universally) consistent decision rules and demonstrate in simulation that an empirical estimate of task similarity is a function of the decision rule deployed for inference. We demonstrate that for a given target distribution, both transfer efficiency and semantic similarity of candidate source distributions correlate with empirical task similarity.

Machine Learning Machine Learning Methodology

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Robust Validation: Confident Predictions Even When Distributions Shift

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions