Bayesian Learning of Neural Network Architectures

156 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Georgi Dikov

تاريخ النشر 2019

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Georgi Dikov - Patrick van der Smagt - Justin Bayer

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus keeping the computational overhead at minimum.

قيم البحث

158 - Vincent Fortuin , Adri`a Garriga-Alonso , Florian Wenzel 2021

Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, such simplistic priors are unlikely to either accurately reflect our true beliefs about the weight distributions, or to give optimal performanc e. We study summary statistics of neural network weights in different networks trained using SGD. We find that fully connected networks (FCNNs) display heavy-tailed weight distributions, while convolutional neural network (CNN) weights display strong spatial correlations. Building these observations into the respective priors leads to improved performance on a variety of image classification datasets. Moreover, we find that these priors also mitigate the cold posterior effect in FCNNs, while in CNNs we see strong improvements at all temperatures, and hence no reduction in the cold posterior effect.

التعلم الالي التعلم الآلي

Partitioned hybrid learning of Bayesian network structures

125 - Jireh Huang , Qing Zhou 2021

We develop a novel hybrid method for Bayesian network structure learning called partitioned hybrid greedy search (pHGS), composed of three distinct yet compatible new algorithms: Partitioned PC (pPC) accelerates skeleton learning via a divide-and-con quer strategy, $p$-value adjacency thresholding (PATH) effectively accomplishes parameter tuning with a single execution, and hybrid greedy initialization (HGI) maximally utilizes constraint-based information to obtain a high-scoring and well-performing initial graph for greedy search. We establish structure learning consistency of our algorithms in the large-sample limit, and empirically validate our methods individually and collectively through extensive numerical comparisons. The combined merits of pPC and PATH achieve significant computational reductions compared to the PC algorithm without sacrificing the accuracy of estimated structures, and our generally applicable HGI strategy reliably improves the estimation structural accuracy of popular hybrid algorithms with negligible additional computational expense. Our empirical results demonstrate the superior empirical performance of pHGS against many state-of-the-art structure learning algorithms.

التعلم الالي التعلم الآلي

Bayesian Nonparametric Federated Learning of Neural Networks

165 - Mikhail Yurochkin , Mayank Agarwal , Soumya Ghosh 2019

In federated learning problems, data is scattered across different servers and exchanging or pooling it is often impractical or prohibited. We develop a Bayesian nonparametric framework for federated learning with neural networks. Each data server is assumed to provide local neural network weights, which are modeled through our framework. We then develop an inference approach that allows us to synthesize a more expressive global network without additional supervision, data pooling and with as few as a single communication round. We then demonstrate the efficacy of our approach on federated learning problems simulated from two popular image classification datasets.

التعلم الالي التعلم الآلي

Informative Bayesian Neural Network Priors for Weak Signals

97 - Tianyu Cui , Aki Havulinna , Pekka Marttinen 2020

Encoding domain knowledge into the prior over the high-dimensional weight space of a neural network is challenging but essential in applications with limited data and weak signals. Two types of domain knowledge are commonly available in scientific ap plications: 1. feature sparsity (fraction of features deemed relevant); 2. signal-to-noise ratio, quantified, for instance, as the proportion of variance explained (PVE). We show how to encode both types of domain knowledge into the widely used Gaussian scale mixture priors with Automatic Relevance Determination. Specifically, we propose a new joint prior over the local (i.e., feature-specific) scale parameters that encodes knowledge about feature sparsity, and a Stein gradient optimization to tune the hyperparameters in such a way that the distribution induced on the models PVE matches the prior distribution. We show empirically that the new prior improves prediction accuracy, compared to existing neural network priors, on several publicly available datasets and in a genetics application where signals are weak and sparse, often outperforming even computationally intensive cross-validation for hyperparameter tuning.

التعلم الالي التعلم الآلي

Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate Bayesian Computation

136 - Samuel Wiqvist , Pierre-Alexandre Mattei , Umberto Picchini 2019

We present a novel family of deep neural architectures, named partially exchangeable networks (PENs) that leverage probabilistic symmetries. By design, PENs are invariant to block-switch transformations, which characterize the partial exchangeability properties of conditionally Markovian processes. Moreover, we show that any block-switch invariant function has a PEN-like representation. The DeepSets architecture is a special case of PEN and we can therefore also target fully exchangeable data. We employ PENs to learn summary statistics in approximate Bayesian computation (ABC). When comparing PENs to previous deep learning methods for learning summary statistics, our results are highly competitive, both considering time series and static models. Indeed, PENs provide more reliable posterior samples even when using less training data.

التعلم الالي التعلم الآلي حساب