Assessing the Robustness of Bayesian Dark Knowledge to Posterior Uncertainty

158 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Meet Vadera

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Meet P. Vadera - Benjamin M. Marlin

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Bayesian Dark Knowledge is a method for compressing the posterior predictive distribution of a neural network model into a more compact form. Specifically, the method attempts to compress a Monte Carlo approximation to the parameter posterior into a single network representing the posterior predictive distribution. Further, the authors show that this approach is successful in the classification setting using a student network whose architecture matches that of a single network in the teacher ensemble. In this work, we examine the robustness of Bayesian Dark Knowledge to higher levels of posterior uncertainty. We show that using a student network that matches the teacher architecture may fail to yield acceptable performance. We study an approach to close the resulting performance gap by increasing student model capacity.

قيم البحث

اقرأ أيضاً

Bayesian Dark Knowledge

549 - Anoop Korattikara , Vivek Rathod , Kevin Murphy 2015

We consider the problem of Bayesian parameter estimation for deep neural networks, which is important in problem settings where we may have little data, and/ or where we need accurate posterior predictive densities, e.g., for applications involving b andits or active learning. One simple approach to this is to use online Monte Carlo methods, such as SGLD (stochastic gradient Langevin dynamics). Unfortunately, such a method needs to store many copies of the parameters (which wastes memory), and needs to make predictions using man

التعلم الآلي التعلم الالي

Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?

120 - Chaoqi Wang , Shengyang Sun , Roger Grosse 2020

While uncertainty estimation is a well-studied topic in deep learning, most such work focuses on marginal uncertainty estimates, i.e. the predictive mean and variance at individual input locations. But it is often more useful to estimate predictive c orrelations between the function values at different input locations. In this paper, we consider the problem of benchmarking how accurately Bayesian models can estimate predictive correlations. We first consider a downstream task which depends on posterior predictive correlations: transductive active learning (TAL). We find that TAL makes better use of models uncertainty estimates than ordinary active learning, and recommend this as a benchmark for evaluating Bayesian models. Since TAL is too expensive and indirect to guide development of algorithms, we introduce two metrics which more directly evaluate the predictive correlations and which can be computed efficiently: meta-correlations (i.e. the correlations between the models correlation estimates and the true values), and cross-normalized likelihoods (XLL). We validate these metrics by demonstrating their consistency with TAL performance and obtain insights about the relative performance of current Bayesian neural net and Gaussian process models.

التعلم الآلي التعلم الالي

Assessing the Adversarial Robustness of Monte Carlo and Distillation Methods for Deep Bayesian Neural Network Classification

179 - Meet P. Vadera , Satya Narayan Shukla , Brian Jalaian 2020

In this paper, we consider the problem of assessing the adversarial robustness of deep neural network models under both Markov chain Monte Carlo (MCMC) and Bayesian Dark Knowledge (BDK) inference approximations. We characterize the robustness of each method to two types of adversarial attacks: the fast gradient sign method (FGSM) and projected gradient descent (PGD). We show that full MCMC-based inference has excellent robustness, significantly outperforming standard point estimation-based learning. On the other hand, BDK provides marginal improvements. As an additional contribution, we present a storage-efficient approach to computing adversarial examples for large Monte Carlo ensembles using both the FGSM and PGD attacks.

التعلم الآلي التعلم الالي

Robustness of Bayesian Neural Networks to Gradient-Based Attacks

114 - Ginevra Carbone , Matthew Wicker , Luca Laurenti 2020

Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, the problem remains open. In this paper, we analyse th e geometry of adversarial attacks in the large-data, overparametrized limit for Bayesian Neural Networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lies on a lower-dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in the limit BNN posteriors are robust to gradient-based adversarial attacks. Experimental results on the MNIST and Fashion MNIST datasets with BNNs trained with Hamiltonian Monte Carlo and Variational Inference support this line of argument, showing that BNNs can display both high accuracy and robustness to gradient based adversarial attacks.

التعلم الآلي التعلم الالي

Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks

125 - Meet P. Vadera , Brian Jalaian , Benjamin M. Marlin 2020

In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework. The proposed framework t akes as input teacher and student model architectures and a general posterior expectation of interest. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We focus on the posterior predictive distribution and expected entropy as distillation targets. We investigate several aspects of this framework including the impact of uncertainty and the choice of student model architecture. We study methods for student model architecture search from a speed-storage-accuracy perspective and evaluate down-stream tasks leveraging entropy distillation including uncertainty ranking and out-of-distribution detection.

التعلم الآلي التعلم الالي