Bayesian Nonparametric Federated Learning of Neural Networks

166 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mikhail Yurochkin

تاريخ النشر 2019

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Mikhail Yurochkin - Mayank Agarwal - Soumya Ghosh

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In federated learning problems, data is scattered across different servers and exchanging or pooling it is often impractical or prohibited. We develop a Bayesian nonparametric framework for federated learning with neural networks. Each data server is assumed to provide local neural network weights, which are modeled through our framework. We then develop an inference approach that allows us to synthesize a more expressive global network without additional supervision, data pooling and with as few as a single communication round. We then demonstrate the efficacy of our approach on federated learning problems simulated from two popular image classification datasets.

قيم البحث

102 - Soumya Ghosh , Jiayu Yao , Finale Doshi-Velez 2018

Bayesian Neural Networks (BNNs) have recently received increasing attention for their ability to provide well-calibrated posterior uncertainties. However, model selection---even choosing the number of nodes---remains an open question. Recent work has proposed the use of a horseshoe prior over node pre-activations of a Bayesian neural network, which effectively turns off nodes that do not help explain the data. In this work, we propose several modeling and inference advances that consistently improve the compactness of the model learned while maintaining predictive performance, especially in smaller-sample settings including reinforcement learning.

التعلم الالي التعلم الآلي

Practical Bayesian Learning of Neural Networks via Adaptive Optimisation Methods

108 - Samuel Kessler , Arnold Salas , Vincent W. C. Tan 2018

We introduce a novel framework for the estimation of the posterior distribution over the weights of a neural network, based on a new probabilistic interpretation of adaptive optimisation algorithms such as AdaGrad and Adam. We demonstrate the effecti veness of our Bayesian Adam method, Badam, by experimentally showing that the learnt uncertainties correctly relate to the weights predictive capabilities by weight pruning. We also demonstrate the quality of the derived uncertainty measures by comparing the performance of Badam to standard methods in a Thompson sampling setting for multi-armed bandits, where good uncertainty measures are required for an agent to balance exploration and exploitation.

التعلم الالي التعلم الآلي

Hierarchical Indian Buffet Neural Networks for Bayesian Continual Learning

94 - Samuel Kessler , Vu Nguyen , Stefan Zohren 2019

We place an Indian Buffet process (IBP) prior over the structure of a Bayesian Neural Network (BNN), thus allowing the complexity of the BNN to increase and decrease automatically. We further extend this model such that the prior on the structure of each hidden layer is shared globally across all layers, using a Hierarchical-IBP (H-IBP). We apply this model to the problem of resource allocation in Continual Learning (CL) where new tasks occur and the network requires extra resources. Our model uses online variational inference with reparameterisation of the Bernoulli and Beta distributions, which constitute the IBP and H-IBP priors. As we automatically learn the number of weights in each layer of the BNN, overfitting and underfitting problems are largely overcome. We show empirically that our approach offers a competitive edge over existing methods in CL.

التعلم الالي التعلم الآلي

Compromise-free Bayesian neural networks

126 - Kamran Javid , Will Handley , Mike Hobson 2020

We conduct a thorough analysis of the relationship between the out-of-sample performance and the Bayesian evidence (marginal likelihood) of Bayesian neural networks (BNNs), as well as looking at the performance of ensembles of BNNs, both using the Bo ston housing dataset. Using the state-of-the-art in nested sampling, we numerically sample the full (non-Gaussian and multimodal) network posterior and obtain numerical estimates of the Bayesian evidence, considering network models with up to 156 trainable parameters. The networks have between zero and four hidden layers, either $tanh$ or $ReLU$ activation functions, and with and without hierarchical priors. The ensembles of BNNs are obtained by determining the posterior distribution over networks, from the posterior samples of individual BNNs re-weighted by the associated Bayesian evidence values. There is good correlation between out-of-sample performance and evidence, as well as a remarkable symmetry between the evidence versus model size and out-of-sample performance versus model size planes. Networks with $ReLU$ activation functions have consistently higher evidences than those with $tanh$ functions, and this is reflected in their out-of-sample performance. Ensembling over architectures acts to further improve performance relative to the individual BNNs.

التعلم الالي التعلم الآلي تطبيقات الإحصاء

Bayesian Learning of Neural Network Architectures

155 - Georgi Dikov , Patrick van der Smagt , Justin Bayer 2019

In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks wi th a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus keeping the computational overhead at minimum.

التعلم الالي التعلم الآلي