BNNpriors: A library for Bayesian neural network inference with different prior distributions

177 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Vincent Fortuin

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Vincent Fortuin - Adri`a Garriga-Alonso - Mark van der Wilk

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Bayesian neural networks have shown great promise in many applications where calibrated uncertainty estimates are crucial and can often also lead to a higher predictive performance. However, it remains challenging to choose a good prior distribution over their weights. While isotropic Gaussian priors are often chosen in practice due to their simplicity, they do not reflect our true prior beliefs well and can lead to suboptimal performance. Our new library, BNNpriors, enables state-of-the-art Markov Chain Monte Carlo inference on Bayesian neural networks with a wide range of predefined priors, including heavy-tailed ones, hierarchical ones, and mixture priors. Moreover, it follows a modular approach that eases the design and implementation of new custom priors. It has facilitated foundational discoveries on the nature of the cold posterior effect in Bayesian neural networks and will hopefully catalyze future research as well as practical applications in this area.

قيم البحث

152 - Qingming Tang , Lifu Tu , Weiran Wang 2016

We propose a novel method for network inference from partially observed edges using a node-specific degree prior. The degree prior is derived from observed edges in the network to be inferred, and its hyper-parameters are determined by cross validati on. Then we formulate network inference as a matrix completion problem regularized by our degree prior. Our theoretical analysis indicates that this prior favors a network following the learned degree distribution, and may lead to improved network recovery error bound than previous work. Experimental results on both simulated and real biological networks demonstrate the superior performance of our method in various settings.

التعلم الالي التعلم الآلي

Bayesian Neural Network Priors Revisited

158 - Vincent Fortuin , Adri`a Garriga-Alonso , Florian Wenzel 2021

Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, such simplistic priors are unlikely to either accurately reflect our true beliefs about the weight distributions, or to give optimal performanc e. We study summary statistics of neural network weights in different networks trained using SGD. We find that fully connected networks (FCNNs) display heavy-tailed weight distributions, while convolutional neural network (CNN) weights display strong spatial correlations. Building these observations into the respective priors leads to improved performance on a variety of image classification datasets. Moreover, we find that these priors also mitigate the cold posterior effect in FCNNs, while in CNNs we see strong improvements at all temperatures, and hence no reduction in the cold posterior effect.

التعلم الالي التعلم الآلي

Informative Bayesian Neural Network Priors for Weak Signals

97 - Tianyu Cui , Aki Havulinna , Pekka Marttinen 2020

Encoding domain knowledge into the prior over the high-dimensional weight space of a neural network is challenging but essential in applications with limited data and weak signals. Two types of domain knowledge are commonly available in scientific ap plications: 1. feature sparsity (fraction of features deemed relevant); 2. signal-to-noise ratio, quantified, for instance, as the proportion of variance explained (PVE). We show how to encode both types of domain knowledge into the widely used Gaussian scale mixture priors with Automatic Relevance Determination. Specifically, we propose a new joint prior over the local (i.e., feature-specific) scale parameters that encodes knowledge about feature sparsity, and a Stein gradient optimization to tune the hyperparameters in such a way that the distribution induced on the models PVE matches the prior distribution. We show empirically that the new prior improves prediction accuracy, compared to existing neural network priors, on several publicly available datasets and in a genetics application where signals are weak and sparse, often outperforming even computationally intensive cross-validation for hyperparameter tuning.

التعلم الالي التعلم الآلي

Loss-Calibrated Approximate Inference in Bayesian Neural Networks

74 - Adam D. Cobb , Stephen J. Roberts , Yarin Gal 2018

Current approaches in approximate inference for Bayesian neural networks minimise the Kullback-Leibler divergence to approximate the true posterior over the weights. However, this approximation is without knowledge of the final application, and there fore cannot guarantee optimal predictions for a given task. To make more suitable task-specific approximations, we introduce a new loss-calibrated evidence lower bound for Bayesian neural networks in the context of supervised learning, informed by Bayesian decision theory. By introducing a lower bound that depends on a utility function, we ensure that our approximation achieves higher utility than traditional methods for applications that have asymmetric utility functions. Furthermore, in using dropout inference, we highlight that our new objective is identical to that of standard dropout neural networks, with an additional utility-dependent penalty term. We demonstrate our new loss-calibrated model with an illustrative medical example and a restricted model capacity experiment, and highlight failure modes of the comparable weighted cross entropy approach. Lastly, we demonstrate the scalability of our method to real world applications with per-pixel semantic segmentation on an autonomous driving data set.

التعلم الالي التعلم الآلي

Bayesian Learning of Neural Network Architectures

155 - Georgi Dikov , Patrick van der Smagt , Justin Bayer 2019

In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks wi th a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus keeping the computational overhead at minimum.

التعلم الالي التعلم الآلي