Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

155 0 0.0 ( 0 )

Download Cite

Added by Chris J. Maddison

Publication date 2016

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Chris J. Maddison - Andriy Mnih - Yee Whye Teh

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the graph are low variance unbiased estimators of the gradients of the expected loss. While many continuous random variables have such reparameterizations, discrete random variables lack useful reparameterizations due to the discontinuous nature of discrete states. In this work we introduce Concrete random variables---continuous relaxations of discrete random variables. The Concrete distribution is a new family of distributions with closed form densities and a simple reparameterization. Whenever a discrete stochastic node of a computation graph can be refactored into a one-hot bit representation that is treated continuously, Concrete stochastic nodes can be used with automatic differentiation to produce low-variance biased gradients of objectives (including objectives that depend on the log-probability of latent stochastic nodes) on the corresponding discrete graph. We demonstrate the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks.

rate research

Generalized Gumbel-Softmax Gradient Estimator for Various Discrete Random Variables

135 - Weonyoung Joo , Dongjun Kim , Seungjae Shin 2020

Estimating the gradients of stochastic nodes is one of the crucial research questions in the deep generative modeling community, which enables the gradient descent optimization on neural network parameters. This estimation problem becomes further complex when we regard the stochastic nodes to be discrete because pathwise derivative techniques cannot be applied. Hence, the stochastic gradient estimation of discrete distributions requires either a score function method or continuous relaxation of the discrete random variables. This paper proposes a general version of the Gumbel-Softmax estimator with continuous relaxation, and this estimator is able to relax the discreteness of probability distributions including more diverse types, other than categorical and Bernoulli. In detail, we utilize the truncation of discrete random variables and the Gumbel-Softmax trick with a linear transformation for the relaxed reparameterization. The proposed approach enables the relaxed discrete random variable to be reparameterized and to backpropagated through a large scale stochastic computational graph. Our experiments consist of (1) synthetic data analyses, which show the efficacy of our methods; and (2) applications on VAE and topic model, which demonstrate the value of the proposed estimation in practices.

Machine Learning Machine Learning

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

150 - Wouter Kool , Herke van Hoof , Max Welling 2020

We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators. Combining our estimator with REINFORCE, we obtain a policy gradient estimator and we reduce its variance using a built-in control variate which is obtained without additional model evaluations. The resulting estimator is closely related to other gradient estimators. Experiments with a toy problem, a categorical Variational Auto-Encoder and a structured prediction problem show that our estimator is the only estimator that is consistently among the best estimators in both high and low entropy settings.

Machine Learning Machine Learning

The exact distribution of the sample variance from bounded continuous random variables

493 - T. Royen 2008

For a sample of absolutely bounded i.i.d. random variables with a continuous density the cumulative distribution function of the sample variance is represented by a univariate integral over a Fourier series. If the density is a polynomial or a trigonometrical polynomial the coefficients of this series are simple finite terms containing only the error function, the exponential function and powers. In more general cases - e.g. for all beta densities - the coefficients are given by some series expansions. The method is generalized to positive semi-definite quadratic forms of bounded independent but not necessarily identically distributed random variables if the form matrix differs from a diagonal matrix D > 0 only by a matrix of rank 1

Statistics Theory Statistics Theory

Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

98 - Gonc{c}alo M. Correia , Vlad Niculae , Wilker Aziz 2020

Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new training strategy which replaces these estimators by an exact yet efficient marginalization. To achieve this, we parameterize discrete distributions over latent assignments using differentiable sparse mappings: sparsemax and its structured counterparts. In effect, the support of these distributions is greatly reduced, which enables efficient marginalization. We report successful results in three tasks covering a range of latent variable modeling applications: a semisupervised deep generative model, a latent communication game, and a generative model with a bit-vector latent representation. In all cases, we obtain good performance while still achieving the practicality of sampling-based approximations.

Machine Learning Machine Learning

On Learning Continuous Pairwise Markov Random Fields

119 - Abhin Shah , Devavrat Shah , Gregory W. Wornell 2020

We consider learning a sparse pairwise Markov Random Field (MRF) with continuous-valued variables from i.i.d samples. We adapt the algorithm of Vuffray et al. (2019) to this setting and provide finite-sample analysis revealing sample complexity scaling logarithmically with the number of variables, as in the discrete and Gaussian settings. Our approach is applicable to a large class of pairwise MRFs with continuous variables and also has desirable asymptotic properties, including consistency and normality under mild conditions. Further, we establish that the population version of the optimization criterion employed in Vuffray et al. (2019) can be interpreted as local maximum likelihood estimation (MLE). As part of our analysis, we introduce a robust variation of sparse linear regression a` la Lasso, which may be of interest in its own right.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions