Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations

98 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Winnie Xu

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Winnie Xu - Ricky T.Q. Chen - Xuechen Li

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We perform scalable approximate inference in a continuous-depth Bayesian neural network family. In this model class, uncertainty about separate weights in each layer gives hidden units that follow a stochastic differential equation. We demonstrate gradient-based stochastic variational inference in this infinite-parameter setting, producing arbitrarily-flexible approximate posteriors. We also derive a novel gradient estimator that approaches zero variance as the approximate posterior over weights approaches the true posterior. This approach brings continuous-depth Bayesian neural nets to a competitive comparison against discrete-depth alternatives, while inheriting the memory-efficient training and tunable precision of Neural ODEs.

قيم البحث

112 - Julius Ruseckas 2019

In this work we systematically analyze general properties of differential equations used as machine learning models. We demonstrate that the gradient of the loss function with respect to to the hidden state can be considered as a generalized momentum conjugate to the hidden state, allowing application of the tools of classical mechanics. In addition, we show that not only residual networks, but also feedforward neural networks with small nonlinearities and the weights matrices deviating only slightly from identity matrices can be related to the differential equations. We propose a differential equation describing such networks and investigate its properties.

التعلم الالي التعلم الآلي

A Bayesian Approach to Invariant Deep Neural Networks

188 - Nikolaos Mourdoukoutas , Marco Federici , Georges Pantalos 2021

We propose a novel Bayesian neural network architecture that can learn invariances from data alone by inferring a posterior distribution over different weight-sharing schemes. We show that our model outperforms other non-invariant architectures, when trained on datasets that contain specific invariances. The same holds true when no data augmentation is performed.

التعلم الالي التعلم الآلي

Finite Difference Neural Networks: Fast Prediction of Partial Differential Equations

120 - Zheng Shi , Nur Sila Gulgec , Albert S. Berahas 2020

Discovering the underlying behavior of complex systems is an important topic in many science and engineering disciplines. In this paper, we propose a novel neural network framework, finite difference neural networks (FDNet), to learn partial differen tial equations from data. Specifically, our proposed finite difference inspired network is designed to learn the underlying governing partial differential equations from trajectory data, and to iteratively estimate the future dynamical behavior using only a few trainable parameters. We illustrate the performance (predictive power) of our framework on the heat equation, with and without noise and/or forcing, and compare our results to the Forward Euler method. Moreover, we show the advantages of using a Hessian-Free Trust Region method to train the network.

التعلم الالي التعلم الآلي النظم الديناميكية

Identifying Latent Stochastic Differential Equations with Variational Auto-Encoders

95 - Ali Hasan , Jo~ao M. Pereira , Sina Farsiu 2020

We present a method for learning latent stochastic differential equations (SDEs) from high dimensional time series data. Given a time series generated from a lower dimensional It^{o} process, the proposed method uncovers the relevant parameters of th e SDE through a self-supervised learning approach. Using the framework of variational autoencoders (VAEs), we consider a conditional generative model for the data based on the Euler-Maruyama approximation of SDE solutions. Furthermore, we use recent results on identifiability of semi-supervised learning to show that our model can recover not only the underlying SDE parameters, but also the original latent space, up to an isometry, in the limit of infinite data. We validate the model through a series of different simulated video processing tasks where the underlying SDE is known. Our results suggest that the proposed method effectively learns the underlying SDE, as predicted by the theory.

التعلم الالي التعلم الآلي

Compromise-free Bayesian neural networks

126 - Kamran Javid , Will Handley , Mike Hobson 2020

We conduct a thorough analysis of the relationship between the out-of-sample performance and the Bayesian evidence (marginal likelihood) of Bayesian neural networks (BNNs), as well as looking at the performance of ensembles of BNNs, both using the Bo ston housing dataset. Using the state-of-the-art in nested sampling, we numerically sample the full (non-Gaussian and multimodal) network posterior and obtain numerical estimates of the Bayesian evidence, considering network models with up to 156 trainable parameters. The networks have between zero and four hidden layers, either $tanh$ or $ReLU$ activation functions, and with and without hierarchical priors. The ensembles of BNNs are obtained by determining the posterior distribution over networks, from the posterior samples of individual BNNs re-weighted by the associated Bayesian evidence values. There is good correlation between out-of-sample performance and evidence, as well as a remarkable symmetry between the evidence versus model size and out-of-sample performance versus model size planes. Networks with $ReLU$ activation functions have consistently higher evidences than those with $tanh$ functions, and this is reflected in their out-of-sample performance. Ensembling over architectures acts to further improve performance relative to the individual BNNs.

التعلم الالي التعلم الآلي تطبيقات الإحصاء