No Arabic abstract
With few exceptions, neural networks have been relying on backpropagation and gradient descent as the inference engine in order to learn the model parameters, because the closed-form Bayesian inference for neural networks has been considered to be intractable. In this paper, we show how we can leverage the tractable approximate Gaussian inferences (TAGI) capabilities to infer hidden states, rather than only using it for inferring the networks parameters. One novel aspect it allows is to infer hidden states through the imposition of constraints designed to achieve specific objectives, as illustrated through three examples: (1) the generation of adversarial-attack examples, (2) the usage of a neural network as a black-box optimization method, and (3) the application of inference on continuous-action reinforcement learning. These applications showcase how tasks that were previously reserved to gradient-based optimization approaches can now be approached with analytically tractable inference
Since its inception, deep learning has been overwhelmingly reliant on backpropagation and gradient-based optimization algorithms in order to learn weight and bias parameter values. Tractable Approximate Gaussian Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks. In this paper, we are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures. Although TAGIs computational efficiency is still below that of deterministic approaches relying on backpropagation, it outperforms them on classification tasks and matches their performance for information maximizing generative adversarial networks while using smaller architectures trained with fewer epochs.
In this paper, we propose an analytical method for performing tractable approximate Gaussian inference (TAGI) in Bayesian neural networks. The method enables the analytical Gaussian inference of the posterior mean vector and diagonal covariance matrix for weights and biases. The method proposed has a computational complexity of $mathcal{O}(n)$ with respect to the number of parameters $n$, and the tests performed on regression and classification benchmarks confirm that, for a same network architecture, it matches the performance of existing methods relying on gradient backpropagation.
Reinforcement learning (RL) has gained increasing interest since the demonstration it was able to reach human performance on video game benchmarks using deep Q-learning (DQN). The current consensus for training neural networks on such complex environments is to rely on gradient-based optimization. Although alternative Bayesian deep learning methods exist, most of them still rely on gradient-based optimization, and they typically do not scale on benchmarks such as the Atari game environment. Moreover none of these approaches allow performing the analytical inference for the weights and biases defining the neural network. In this paper, we present how we can adapt the temporal difference Q-learning framework to make it compatible with the tractable approximate Gaussian inference (TAGI), which allows learning the parameters of a neural network using a closed-form analytical method. Throughout the experiments with on- and off-policy reinforcement learning approaches, we demonstrate that TAGI can reach a performance comparable to backpropagation-trained networks while using fewer hyperparameters, and without relying on gradient-based optimization.
Approximate inference in deep Bayesian networks exhibits a dilemma of how to yield high fidelity posterior approximations while maintaining computational efficiency and scalability. We tackle this challenge by introducing a novel variational structured approximation inspired by the Bayesian interpretation of Dropout regularization. Concretely, we focus on the inflexibility of the factorized structure in Dropout posterior and then propose an improved method called Variational Structured Dropout (VSD). VSD employs an orthogonal transformation to learn a structured representation on the variational noise and consequently induces statistical dependencies in the approximate posterior. Theoretically, VSD successfully addresses the pathologies of previous Variational Dropout methods and thus offers a standard Bayesian justification. We further show that VSD induces an adaptive regularization term with several desirable properties which contribute to better generalization. Finally, we conduct extensive experiments on standard benchmarks to demonstrate the effectiveness of VSD over state-of-the-art variational methods on predictive accuracy, uncertainty estimation, and out-of-distribution detection.
While deep learning methods continue to improve in predictive accuracy on a wide range of application domains, significant issues remain with other aspects of their performance including their ability to quantify uncertainty and their robustness. Recent advances in approximate Bayesian inference hold significant promise for addressing these concerns, but the computational scalability of these methods can be problematic when applied to large-scale models. In this paper, we describe initial work on the development ofURSABench(the Uncertainty, Robustness, Scalability, and Accu-racy Benchmark), an open-source suite of bench-marking tools for comprehensive assessment of approximate Bayesian inference methods with a focus on deep learning-based classification tasks