How Good is the Bayes Posterior in Deep Neural Networks Really?

81 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sebastian Nowozin

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Florian Wenzel - Kevin Roth - Bastiaan S. Veeling

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a cold posterior that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.

قيم البحث

181 - Seth Nabarro , Stoil Ganev , Adri`a Garriga-Alonso 2021

Data augmentation is a highly effective approach for improving performance in deep neural networks. The standard view is that it creates an enlarged dataset by adding synthetic data, which raises a problem when combining it with Bayesian inference: h ow much data are we really conditioning on? This question is particularly relevant to recent observations linking data augmentation to the cold posterior effect. We investigate various principled ways of finding a log-likelihood for augmented datasets. Our approach prescribes augmenting the same underlying image multiple times, both at test and train-time, and averaging either the logits or the predictive probabilities. Empirically, we observe the best performance with averaging probabilities. While there are interactions with the cold posterior effect, neither averaging logits or averaging probabilities eliminates it.

التعلم الالي التعلم الآلي

Exact posterior distributions of wide Bayesian neural networks

105 - Jiri Hron , Yasaman Bahri , Roman Novak 2020

Recent work has shown that the prior over functions induced by a deep Bayesian neural network (BNN) behaves as a Gaussian process (GP) as the width of all layers becomes large. However, many BNN applications are concerned with the BNN function space posterior. While some empirical evidence of the posterior convergence was provided in the original works of Neal (1996) and Matthews et al. (2018), it is limited to small datasets or architectures due to the notorious difficulty of obtaining and verifying exactness of BNN posterior approximations. We provide the missing theoretical proof that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior. For empirical validation, we show how to generate exact samples from a finite BNN on a small dataset via rejection sampling.

التعلم الالي التعلم الآلي

FuncNN: An R Package to Fit Deep Neural Networks Using Generalized Input Spaces

96 - Barinder Thind , Sidi Wu , Richard Groenewald 2020

Neural networks have excelled at regression and classification problems when the input space consists of scalar variables. As a result of this proficiency, several popular packages have been developed that allow users to easily fit these kinds of mod els. However, the methodology has excluded the use of functional covariates and to date, there exists no software that allows users to build deep learning models with this generalized input space. To the best of our knowledge, the functional neural network (FuncNN) library is the first such package in any programming language; the library has been developed for R and is built on top of the keras architecture. Throughout this paper, several functions are introduced that provide users an avenue to easily build models, generate predictions, and run cross-validations. A summary of the underlying methodology is also presented. The ultimate contribution is a package that provides a set of general modelling and diagnostic tools for data problems in which there exist both functional and scalar covariates.

التعلم الالي التعلم الآلي حساب

Variable Selection with Rigorous Uncertainty Quantification using Deep Bayesian Neural Networks: Posterior Concentration and Bernstein-von Mises Phenomenon

109 - Jeremiah Zhe Liu 2019

This work develops rigorous theoretical basis for the fact that deep Bayesian neural network (BNN) is an effective tool for high-dimensional variable selection with rigorous uncertainty quantification. We develop new Bayesian non-parametric theorems to show that a properly configured deep BNN (1) learns the variable importance effectively in high dimensions, and its learning rate can sometimes break the curse of dimensionality. (2) BNNs uncertainty quantification for variable importance is rigorous, in the sense that its 95% credible intervals for variable importance indeed covers the truth 95% of the time (i.e., the Bernstein-von Mises (BvM) phenomenon). The theoretical results suggest a simple variable selection algorithm based on the BNNs credible intervals. Extensive simulation confirms the theoretical findings and shows that the proposed algorithm outperforms existing classic and neural-network-based variable selection methods, particularly in high dimensions.

التعلم الالي التعلم الآلي

Sequential Neural Posterior and Likelihood Approximation

169 - Samuel Wiqvist , Jes Frellsen , Umberto Picchini 2021

We introduce the sequential neural posterior and likelihood approximation (SNPLA) algorithm. SNPLA is a normalizing flows-based algorithm for inference in implicit models, and therefore is a simulation-based inference method that only requires simula tions from a generative model. SNPLA avoids Markov chain Monte Carlo sampling and correction-steps of the parameter proposal function that are introduced in similar methods, but that can be numerically unstable or restrictive. By utilizing the reverse KL divergence, SNPLA manages to learn both the likelihood and the posterior in a sequential manner. Over four experiments, we show that SNPLA performs competitively when utilizing the same number of model simulations as used in other methods, even though the inference problem for SNPLA is more complex due to the joint learning of posterior and likelihood function. Due to utilizing normalizing flows SNPLA generates posterior draws much faster (4 orders of magnitude) than MCMC-based methods.

التعلم الالي التعلم الآلي المنهجية