A Neural Network MCMC sampler that maximizes Proposal Entropy

75 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zengyi Li

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Zengyi Li - Yubei Chen - Friedrich T. Sommer

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Markov Chain Monte Carlo (MCMC) methods sample from unnormalized probability distributions and offer guarantees of exact sampling. However, in the continuous case, unfavorable geometry of the target distribution can greatly limit the efficiency of MCMC methods. Augmenting samplers with neural networks can potentially improve their efficiency. Previous neural network based samplers were trained with objectives that either did not explicitly encourage exploration, or used a L2 jump objective which could only be applied to well structured distributions. Thus it seems promising to instead maximize the proposal entropy for adapting the proposal to distributions of any shape. To allow direct optimization of the proposal entropy, we propose a neural network MCMC sampler that has a flexible and tractable proposal distribution. Specifically, our network architecture utilizes the gradient of the target distribution for generating proposals. Our model achieves significantly higher efficiency than previous neural network MCMC techniques in a variety of sampling tasks. Further, the sampler is applied on training of a convergent energy-based model of natural images. The adaptive sampler achieves unbiased sampling with significantly higher proposal entropy than Langevin dynamics sampler.

قيم البحث

اقرأ أيضاً

Stein Neural Sampler

66 - Tianyang Hu , Zixiang Chen , Hanxi Sun 2018

We propose two novel samplers to generate high-quality samples from a given (un-normalized) probability density. Motivated by the success of generative adversarial networks, we construct our samplers using deep neural networks that transform a refere nce distribution to the target distribution. Training schemes are developed to minimize two variations of the Stein discrepancy, which is designed to work with un-normalized densities. Once trained, our samplers are able to generate samples instantaneously. We show that the proposed methods are theoretically sound and experience fewer convergence issues compared with traditional sampling approaches according to our empirical studies.

التعلم الالي التعلم الآلي

The Coordinate Sampler: A Non-Reversible Gibbs-like MCMC Sampler

74 - Changye Wu , Christian P.n Robert (CEREMADE 2018

In this article, we derive a novel non-reversible, continuous-time Markov chain Monte Carlo (MCMC) sampler, called Coordinate Sampler, based on a piecewise deterministic Markov process (PDMP), which can be seen as a variant of the Zigzag sampler. In addition to proving a theoretical validation for this new sampling algorithm, we show that the Markov chain it induces exhibits geometrical ergodicity convergence, for distributions whose tails decay at least as fast as an exponential distribution and at most as fast as a Gaussian distribution. Several numerical examples highlight that our coordinate sampler is more efficient than the Zigzag sampler, in terms of effective sample size.

حساب

Information-theoretic Classification Accuracy: A Criterion that Guides Data-driven Combination of Ambiguous Outcome Labels in Multi-class Classification

248 - Chihao Zhang , Yiling Elaine Chen , Shihua Zhang 2021

Outcome labeling ambiguity and subjectivity are ubiquitous in real-world datasets. While practitioners commonly combine ambiguous outcome labels in an ad hoc way to improve the accuracy of multi-class classification, there lacks a principled approach to guide label combination by any optimality criterion. To address this problem, we propose the information-theoretic classification accuracy (ITCA), a criterion of outcome information conditional on outcome prediction, to guide practitioners on how to combine ambiguous outcome labels. ITCA indicates a balance in the trade-off between prediction accuracy (how well do predicted labels agree with actual labels) and prediction resolution (how many labels are predictable). To find the optimal label combination indicated by ITCA, we develop two search strategies: greedy search and breadth-first search. Notably, ITCA and the two search strategies are adaptive to all machine-learning classification algorithms. Coupled with a classification algorithm and a search strategy, ITCA has two uses: to improve prediction accuracy and to identify ambiguous labels. We first verify that ITCA achieves high accuracy with both search strategies in finding the correct label combinations on synthetic and real data. Then we demonstrate the effectiveness of ITCA in diverse applications including medical prognosis, cancer survival prediction, user demographics prediction, and cell type classification.

التعلم الالي التعلم الآلي المنهجية

A Neural Network model with Bidirectional Whitening

93 - Yuki Fujimoto , Toru Ohira 2017

We present here a new model and algorithm which performs an efficient Natural gradient descent for Multilayer Perceptrons. Natural gradient descent was originally proposed from a point of view of information geometry, and it performs the steepest des cent updates on manifolds in a Riemannian space. In particular, we extend an approach taken by the Whitened neural networks model. We make the whitening process not only in feed-forward direction as in the original model, but also in the back-propagation phase. Its efficacy is shown by an application of this Bidirectional whitened neural networks model to a handwritten character recognition data (MNIST data).

التعلم الالي التعلم الآلي

Sequential Neural Posterior and Likelihood Approximation

169 - Samuel Wiqvist , Jes Frellsen , Umberto Picchini 2021

We introduce the sequential neural posterior and likelihood approximation (SNPLA) algorithm. SNPLA is a normalizing flows-based algorithm for inference in implicit models, and therefore is a simulation-based inference method that only requires simula tions from a generative model. SNPLA avoids Markov chain Monte Carlo sampling and correction-steps of the parameter proposal function that are introduced in similar methods, but that can be numerically unstable or restrictive. By utilizing the reverse KL divergence, SNPLA manages to learn both the likelihood and the posterior in a sequential manner. Over four experiments, we show that SNPLA performs competitively when utilizing the same number of model simulations as used in other methods, even though the inference problem for SNPLA is more complex due to the joint learning of posterior and likelihood function. Due to utilizing normalizing flows SNPLA generates posterior draws much faster (4 orders of magnitude) than MCMC-based methods.

التعلم الالي التعلم الآلي المنهجية