Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

361 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wenlong Mou

تاريخ النشر 2019

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Wenlong Mou - Nhat Ho - Martin J. Wainwright

التعلم الالي بنى وهياكل البيانات والخوارزميات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For symmetric two-component Gaussian mixtures, we prove that its mixing time is bounded as $d^{1.5}(d + Vert theta_{0} Vert^2)^{4.5}$ as long as the sample size $n$ is of the order $d (d + Vert theta_{0} Vert^2)$. Notably, this result requires no conditions on the separation of the two means. En route to proving this bound, we establish some new results of possible independent interest that allow for combining Poincar{e} inequalities for conditional and marginal densities.

قيم البحث

178 - Wenlong Mou , Yi-An Ma , Martin J. Wainwright 2019

We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities. The higher-order dynamics allow for more flexible discretization schemes, and we deve lop a specific method that combines splitting with more accurate integration. For a broad class of $d$-dimensional distributions arising from generalized linear models, we prove that the resulting third-order algorithm produces samples from a distribution that is at most $varepsilon > 0$ in Wasserstein distance from the target distribution in $Oleft(frac{d^{1/4}}{ varepsilon^{1/2}} right)$ steps. This result requires only Lipschitz conditions on the gradient. For general strongly convex potentials with $alpha$-th order smoothness, we prove that the mixing time scales as $O left(frac{d^{1/4}}{varepsilon^{1/2}} + frac{d^{1/2}}{varepsilon^{1/(alpha - 1)}} right)$.

التعلم الالي بنى وهياكل البيانات والخوارزميات التعلم الآلي

MCMC for Normalized Random Measure Mixture Models

343 - Stefano Favaro , Yee Whye Teh 2013

This paper concerns the use of Markov chain Monte Carlo methods for posterior sampling in Bayesian nonparametric mixture models with normalized random measure priors. Making use of some recent posterior characterizations for the class of normalized r andom measures, we propose novel Markov chain Monte Carlo methods of both marginal type and conditional type. The proposed marginal samplers are generalizations of Neals well-regarded Algorithm 8 for Dirichlet process mixture models, whereas the conditional sampler is a variation of those recently introduced in the literature. For both the marginal and conditional methods, we consider as a running example a mixture model with an underlying normalized generalized Gamma process prior, and describe comparative simulation results demonstrating the efficacies of the proposed methods.

المنهجية

An Efficient Sampling Algorithm for Non-smooth Composite Potentials

88 - Wenlong Mou , Nicolas Flammarion , Martin J. Wainwright 2019

We consider the problem of sampling from a density of the form $p(x) propto exp(-f(x)- g(x))$, where $f: mathbb{R}^d rightarrow mathbb{R}$ is a smooth and strongly convex function and $g: mathbb{R}^d rightarrow mathbb{R}$ is a convex and Lipschitz fu nction. We propose a new algorithm based on the Metropolis-Hastings framework, and prove that it mixes to within TV distance $varepsilon$ of the target density in at most $O(d log (d/varepsilon))$ iterations. This guarantee extends previous results on sampling from distributions with smooth log densities ($g = 0$) to the more general composite non-smooth case, with the same mixing time up to a multiple of the condition number. Our method is based on a novel proximal-based proposal distribution that can be efficiently computed for a large class of non-smooth functions $g$.

التعلم الالي بنى وهياكل البيانات والخوارزميات التعلم الآلي

Conjugate Mixture Models for Clustering Multimodal Data

420 - Vasil Khalidov , Florence Forbes , Radu Horaud 2020

The problem of multimodal clustering arises whenever the data are gathered with several physically different sensors. Observations from different modalities are not necessarily aligned in the sense there there is no obvious way to associate or to com pare them in some common space. A solution may consist in considering multiple clustering tasks independently for each modality. The main difficulty with such an approach is to guarantee that the unimodal clusterings are mutually consistent. In this paper we show that multimodal clustering can be addressed within a novel framework, namely conjugate mixture models. These models exploit the explicit transformations that are often available between an unobserved parameter space (objects) and each one of the observation spaces (sensors). We formulate the problem as a likelihood maximization task and we derive the associated conjugate expectation-maximization algorithm. The convergence properties of the proposed algorithm are thoroughly investigated. Several local/global optimization techniques are proposed in order to increase its convergence speed. Two initialization strategies are proposed and compared. A consistent model-selection criterion is proposed. The algorithm and its variants are tested and evaluated within the task of 3D localization of several speakers using both auditory and visual data.

التعلم الالي الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Posterior Regularisation on Bayesian Hierarchical Mixture Clustering

157 - Weipeng Huang , Tin Lok James Ng , Nishma Laitonjam 2021

We study a recent inferential framework, named posterior regularisation, on the Bayesian hierarchical mixture clustering (BHMC) model. This framework facilitates a simple way to impose extra constraints on a Bayesian model to overcome some weakness o f the original model. It narrows the search space of the parameters of the Bayesian model through a formalism that imposes certain constraints on the features of the found solutions. In this paper, in order to enhance the separation of clusters, we apply posterior regularisation to impose max-margin constraints on the nodes at every level of the hierarchy. This paper shows how the framework integrates with BHMC and achieves the expected improvements over the original Bayesian model.

التعلم الالي الذكاء الاصطناعي التعلم الآلي