Improving Uncertainty Calibration via Prior Augmented Data

76 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jeffrey Willette

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jeffrey Willette - Juho Lee - Sung Ju Hwang

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. However, they are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. The problem of overconfidence becomes especially apparent in cases where the test-time data distribution differs from that which was seen during training. We propose a solution to this problem by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels. Our method results in a better calibrated network and is agnostic to the underlying model structure, so it can be applied to any neural network which produces a probability density as an output. We demonstrate the effectiveness of our method and validate its performance on both classification and regression problems, applying it to recent probabilistic neural network models.

قيم البحث

106 - Kanil Patel , William Beluch , Dan Zhang 2019

Uncertainty estimates help to identify ambiguous, novel, or anomalous inputs, but the reliable quantification of uncertainty has proven to be challenging for modern deep networks. In order to improve uncertainty estimation, we propose On-Manifold Adv ersarial Data Augmentation or OMADA, which specifically attempts to generate the most challenging examples by following an on-manifold adversarial attack path in the latent space of an autoencoder-based generative model that closely approximates decision boundaries between two or more classes. On a variety of datasets as well as on multiple diverse network architectures, OMADA consistently yields more accurate and better calibrated classifiers than baseline models, and outperforms competing approaches such as Mixup, as well as achieving similar performance to (at times better than) post-processing calibration methods such as temperature scaling. Variants of OMADA can employ different sampling schemes for ambiguous on-manifold examples based on the entropy of their estimated soft labels, which exhibit specific strengths for generalization, calibration of predicted uncertainty, or detection of out-of-distribution inputs.

التعلم الآلي التعلم الالي

Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning

73 - Kanil Patel , William Beluch , Bin Yang 2020

Post-hoc multi-class calibration is a common approach for providing high-quality confidence estimates of deep neural network predictions. Recent work has shown that widely used scaling methods underestimate their calibration error, while alternative Histogram Binning (HB) methods often fail to preserve classification accuracy. When classes have small prior probabilities, HB also faces the issue of severe sample-inefficiency after the conversion into K one-vs-rest class-wise calibration problems. The goal of this paper is to resolve the identified issues of HB in order to provide calibrated confidence estimates using only a small holdout calibration dataset for bin optimization while preserving multi-class ranking accuracy. From an information-theoretic perspective, we derive the I-Max concept for binning, which maximizes the mutual information between labels and quantized logits. This concept mitigates potential loss in ranking performance due to lossy quantization, and by disentangling the optimization of bin edges and representatives allows simultaneous improvement of ranking and calibration performance. To improve the sample efficiency and estimates from a small calibration set, we propose a shared class-wise (sCW) calibration strategy, sharing one calibrator among similar classes (e.g., with similar class priors) so that the training sets of their class-wise calibration problems can be merged to train the single calibrator. The combination of sCW and I-Max binning outperforms the state of the art calibration methods on various evaluation metrics across different benchmark datasets and models, using a small calibration set (e.g., 1k samples for ImageNet).

التعلم الآلي التعلم الالي

Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

97 - Rasool Fakoor , Jonas Mueller , Nick Erickson 2020

Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators. While highly accurate, the resulting predictors are large, s low, and opaque as compared to their constituents. To improve the deployment of AutoML on tabular data, we propose FAST-DAD to distill arbitrarily complex ensemble predictors into individual models like boosted trees, random forests, and deep networks. At the heart of our approach is a data augmentation strategy based on Gibbs sampling from a self-attention pseudolikelihood estimator. Across 30 datasets spanning regression and binary/multiclass classification tasks, FAST-DAD distillation produces significantly better individual models than one obtains through standard training on the original data. Our individual distilled models are over 10x faster and more accurate than ensemble predictors produced by AutoML tools like H2O/AutoSklearn.

التعلم الآلي التعلم الالي

Improving Adversarial Robustness via Unlabeled Out-of-Domain Data

84 - Zhun Deng , Linjun Zhang , Amirata Ghorbani 2020

Data augmentation by incorporating cheap unlabeled data from multiple domains is a powerful way to improve prediction especially when there is limited labeled data. In this work, we investigate how adversarial robustness can be enhanced by leveraging out-of-domain unlabeled data. We demonstrate that for broad classes of distributions and classifiers, there exists a sample complexity gap between standard and robust classification. We quantify to what degree this gap can be bridged via leveraging unlabeled samples from a shifted domain by providing both upper and lower bounds. Moreover, we show settings where we achieve better adversarial robustness when the unlabeled data come from a shifted domain rather than the same domain as the labeled data. We also investigate how to leverage out-of-domain data when some structural information, such as sparsity, is shared between labeled and unlabeled domains. Experimentally, we augment two object recognition datasets (CIFAR-10 and SVHN) with easy to obtain and unlabeled out-of-domain data and demonstrate substantial improvement in the models robustness against $ell_infty$ adversarial attacks on the original domain.

التعلم الآلي التعلم الالي

ADD: Augmented Disentanglement Distillation Framework for Improving Stock Trend Forecasting

100 - Hongshun Tang , Lijun Wu , Weiqing Liu 2020

Stock trend forecasting has become a popular research direction that attracts widespread attention in the financial field. Though deep learning methods have achieved promising results, there are still many limitations, for example, how to extract cle an features from the raw stock data. In this paper, we introduce an emph{Augmented Disentanglement Distillation (ADD)} approach to remove interferential features from the noised raw data. Specifically, we present 1) a disentanglement structure to separate excess and market information from the stock data to avoid the two factors disturbing each others own prediction. Besides, by applying 2) a dynamic self-distillation method over the disentanglement framework, other implicit interference factors can also be removed. Further, thanks to the decoder module in our framework, 3) a novel strategy is proposed to augment the training samples based on the different excess and market features to improve performance. We conduct experiments on the Chinese stock market data. Results show that our method significantly improves the stock trend forecasting performances, as well as the actual investment income through backtesting, which strongly demonstrates the effectiveness of our approach.

التعلم الآلي الذكاء الاصطناعي