Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions

298 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zheng Wen

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Xiuyuan Lu - Ian Osband - Benjamin Van Roy

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A fundamental challenge for any intelligent system is prediction: given some inputs $X_1,..,X_tau$ can you predict outcomes $Y_1,.., Y_tau$. The KL divergence $mathbf{d}_{mathrm{KL}}$ provides a natural measure of prediction quality, but the majority of deep learning research looks only at the marginal predictions per input $X_t$. In this technical report we propose a scoring rule $mathbf{d}_{mathrm{KL}}^tau$, parameterized by $tau in mathcal{N}$ that evaluates the joint predictions at $tau$ inputs simultaneously. We show that the commonly-used $tau=1$ can be insufficient to drive good decisions in many settings of interest. We also show that, as $tau$ grows, performing well according to $mathbf{d}_{mathrm{KL}}^tau$ recovers universal guarantees for any possible decision. Finally, we provide problem-dependent guidance on the scale of $tau$ for which our score provides sufficient guarantees for good performance.

قيم البحث

69 - Kelly Kochanski , Divya Mohan , Jenna Horrall 2019

A dry decade in the Navajo Nation has killed vegetation, dessicated soils, and released once-stable sand into the wind. This sand now covers one-third of the Nations land, threatening roads, gardens and hundreds of homes. Many arid regions have simil ar problems: global warming has increased dune movement across farmland in Namibia and Angola, and the southwestern US. Current dune models, unfortunately, do not scale well enough to provide useful forecasts for the $sim$5% of land surfaces covered by mobile sand. We test the ability of two deep learning algorithms, a GAN and a CNN, to model the motion of sand dunes. The models are trained on simulated data from community-standard cellular automaton model of sand dunes. Preliminary results show the GAN producing reasonable forward predictions of dune migration at ten million times the speed of the existing model.

التعلم الآلي التعلم الالي

Bayesian Deep Learning via Subnetwork Inference

133 - Erik Daxberger , Eric Nalisnick , James Urquhart Allingham 2020

The Bayesian paradigm has the potential to solve core issues of deep neural networks such as poor calibration and data inefficiency. Alas, scaling Bayesian inference to large weight spaces often requires restrictive approximations. In this work, we s how that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors. The other weights are kept as point estimates. This subnetwork inference framework enables us to use expressive, otherwise intractable, posterior approximations over such subsets. In particular, we implement subnetwork linearized Laplace: We first obtain a MAP estimate of all weights and then infer a full-covariance Gaussian posterior over a subnetwork. We propose a subnetwork selection strategy that aims to maximally preserve the models predictive uncertainty. Empirically, our approach is effective compared to ensembles and less expressive posterior approximations over full networks.

التعلم الآلي التعلم الالي

Evaluating Modules in Graph Contrastive Learning

158 - Ganqu Cui , Yufeng Du , Cheng Yang 2021

The recent emergence of contrastive learning approaches facilitates the research on graph representation learning (GRL), introducing graph contrastive learning (GCL) into the literature. These methods contrast semantically similar and dissimilar samp le pairs to encode the semantics into node or graph embeddings. However, most existing works only performed model-level evaluation, and did not explore the combination space of modules for more comprehensive and systematic studies. For effective module-level evaluation, we propose a framework that decomposes GCL models into four modules: (1) a sampler to generate anchor, positive and negative data samples (nodes or graphs); (2) an encoder and a readout function to get sample embeddings; (3) a discriminator to score each sample pair (anchor-positive and anchor-negative); and (4) an estimator to define the loss function. Based on this framework, we conduct controlled experiments over a wide range of architectural designs and hyperparameter settings on node and graph classification tasks. Specifically, we manage to quantify the impact of a single module, investigate the interaction between modules, and compare the overall performance with current model architectures. Our key findings include a set of module-level guidelines for GCL, e.g., simple samplers from LINE and DeepWalk are strong and robust; an MLP encoder associated with Sum readout could achieve competitive performance on graph classification. Finally, we release our implementations and results as OpenGCL, a modularized toolkit that allows convenient reproduction, standard model and module evaluation, and easy extension.

التعلم الآلي التعلم الالي

Learning Invariances using the Marginal Likelihood

86 - Mark van der Wilk , Matthias Bauer , ST John 2018

Generalising well in supervised learning tasks relies on correctly extrapolating the training data to a large region of the input space. One way to achieve this is to constrain the predictions to be invariant to transformations on the input that are known to be irrelevant (e.g. translation). Commonly, this is done through data augmentation, where the training set is enlarged by applying hand-crafted transformations to the inputs. We argue that invariances should instead be incorporated in the model structure, and learned using the marginal likelihood, which correctly rewards the reduced complexity of invariant models. We demonstrate this for Gaussian process models, due to the ease with which their marginal likelihood can be estimated. Our main contribution is a variational inference scheme for Gaussian processes containing invariances described by a sampling procedure. We learn the sampling procedure by back-propagating through it to maximise the marginal likelihood.

التعلم الآلي التعلم الالي

Self-supervised self-supervision by combining deep learning and probabilistic logic

92 - Hunter Lang , Hoifung Poon 2020

Labeling training examples at scale is a perennial challenge in machine learning. Self-supervision methods compensate for the lack of direct supervision by leveraging prior knowledge to automatically generate noisy labeled examples. Deep probabilisti c logic (DPL) is a unifying framework for self-supervised learning that represents unknown labels as latent variables and incorporates diverse self-supervision using probabilistic logic to train a deep neural network end-to-end using variational EM. While DPL is successful at combining pre-specified self-supervision, manually crafting self-supervision to attain high accuracy may still be tedious and challenging. In this paper, we propose Self-Supervised Self-Supervision (S4), which adds to DPL the capability to learn new self-supervision automatically. Starting from an initial seed, S4 iteratively uses the deep neural network to propose new self supervision. These are either added directly (a form of structured self-training) or verified by a human expert (as in feature-based active learning). Experiments show that S4 is able to automatically propose accurate self-supervision and can often nearly match the accuracy of supervised methods with a tiny fraction of the human effort.

التعلم الآلي التعلم الالي