Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

An Efficient Sampling Algorithm for Non-smooth Composite Potentials

89 0 0.0 ( 0 )

Download Cite

Added by Wenlong Mou

Publication date 2019

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Wenlong Mou - Nicolas Flammarion - Martin J. Wainwright

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider the problem of sampling from a density of the form $p(x) propto exp(-f(x)- g(x))$, where $f: mathbb{R}^d rightarrow mathbb{R}$ is a smooth and strongly convex function and $g: mathbb{R}^d rightarrow mathbb{R}$ is a convex and Lipschitz function. We propose a new algorithm based on the Metropolis-Hastings framework, and prove that it mixes to within TV distance $varepsilon$ of the target density in at most $O(d log (d/varepsilon))$ iterations. This guarantee extends previous results on sampling from distributions with smooth log densities ($g = 0$) to the more general composite non-smooth case, with the same mixing time up to a multiple of the condition number. Our method is based on a novel proximal-based proposal distribution that can be efficiently computed for a large class of non-smooth functions $g$.

rate research

High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm

178 - Wenlong Mou , Yi-An Ma , Martin J. Wainwright 2019

We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities. The higher-order dynamics allow for more flexible discretization schemes, and we develop a specific method that combines splitting with more accurate integration. For a broad class of $d$-dimensional distributions arising from generalized linear models, we prove that the resulting third-order algorithm produces samples from a distribution that is at most $varepsilon > 0$ in Wasserstein distance from the target distribution in $Oleft(frac{d^{1/4}}{ varepsilon^{1/2}} right)$ steps. This result requires only Lipschitz conditions on the gradient. For general strongly convex potentials with $alpha$-th order smoothness, we prove that the mixing time scales as $O left(frac{d^{1/4}}{varepsilon^{1/2}} + frac{d^{1/2}}{varepsilon^{1/(alpha - 1)}} right)$.

Machine Learning Data Structures and Algorithms Machine Learning

Sub-sampling for Efficient Non-Parametric Bandit Exploration

186 - Dorian Baudry 2020

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea first used by the BESA [1] and SSMC [2] algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the flexibility and robustness of this promising novel approach for exploration in bandit models.

Machine Learning Machine Learning

Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

360 - Wenlong Mou , Nhat Ho , Martin J. Wainwright 2019

We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For symmetric two-component Gaussian mixtures, we prove that its mixing time is bounded as $d^{1.5}(d + Vert theta_{0} Vert^2)^{4.5}$ as long as the sample size $n$ is of the order $d (d + Vert theta_{0} Vert^2)$. Notably, this result requires no conditions on the separation of the two means. En route to proving this bound, we establish some new results of possible independent interest that allow for combining Poincar{e} inequalities for conditional and marginal densities.

Machine Learning Data Structures and Algorithms Machine Learning

An efficient sampling algorithm for difficult tree pairs

91 - Sean Cleary , Roland Maio 2020

It is an open question whether there exists a polynomial-time algorithm for computing the rotation distances between pairs of extended ordered binary trees. The problem of computing the rotation distance between an arbitrary pair of trees, (S, T), can be efficiently reduced to the problem of computing the rotation distance between a difficult pair of trees (S, T), where there is no known first step which is guaranteed to be the beginning of a minimal length path. Of interest, therefore, is how to sample such difficult pairs of trees of a fixed size. We show that it is possible to do so efficiently, and present such an algorithm that runs in time $O(n^4)$.

Data Structures and Algorithms

Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models

58 - Benjamin Lengerich , Sarah Tan , Chun-Hao Chang 2019

Models which estimate main effects of individual variables alongside interaction effects have an identifiability challenge: effects can be freely moved between main effects and interaction effects without changing the model prediction. This is a critical problem for interpretability because it permits contradictory models to represent the same function. To solve this problem, we propose pure interaction effects: variance in the outcome which cannot be represented by any smaller subset of features. This definition has an equivalence with the Functional ANOVA decomposition. To compute this decomposition, we present a fast, exact algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to Generalized Additive Models with interactions trained on several datasets and show large disparity, including contradictions, between the effects before and after purification. These results underscore the need to specify data distributions and ensure identifiability before interpreting model parameters.

Machine Learning Artificial Intelligence Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

An Efficient Sampling Algorithm for Non-smooth Composite Potentials

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions