ترغب بنشر مسار تعليمي؟ اضغط هنا

Identifying Mixtures of Mixtures Using Bayesian Estimation

140   0   0.0 ( 0 )
 نشر من قبل Gertraud Malsiner-Walli
 تاريخ النشر 2015
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

The use of a finite mixture of normal distributions in model-based clustering allows to capture non-Gaussian data clusters. However, identifying the clusters from the normal components is challenging and in general either achieved by imposing constraints on the model or by using post-processing procedures. Within the Bayesian framework we propose a different approach based on sparse finite mixtures to achieve identifiability. We specify a hierarchical prior where the hyperparameters are carefully selected such that they are reflective of the cluster structure aimed at. In addition this prior allows to estimate the model using standard MCMC sampling methods. In combination with a post-processing approach which resolves the label switching issue and results in an identified model, our approach allows to simultaneously (1) determine the number of clusters, (2) flexibly approximate the cluster distributions in a semi-parametric way using finite mixtures of normals and (3) identify cluster-specific parameters and classify observations. The proposed approach is illustrated in two simulation studies and on benchmark data sets.



قيم البحث

اقرأ أيضاً

60 - Faicel Chamroukhi 2015
This work relates the framework of model-based clustering for spatial functional data where the data are surfaces. We first introduce a Bayesian spatial spline regression model with mixed-effects (BSSR) for modeling spatial function data. The BSSR mo del is based on Nodal basis functions for spatial regression and accommodates both common mean behavior for the data through a fixed-effects part, and variability inter-individuals thanks to a random-effects part. Then, in order to model populations of spatial functional data issued from heterogeneous groups, we integrate the BSSR model into a mixture framework. The resulting model is a Bayesian mixture of spatial spline regressions with mixed-effects (BMSSR) used for density estimation and model-based surface clustering. The models, through their Bayesian formulation, allow to integrate possible prior knowledge on the data structure and constitute a good alternative to recent mixture of spatial spline regressions model estimated in a maximum likelihood framework via the expectation-maximization (EM) algorithm. The Bayesian model inference is performed by Markov Chain Monte Carlo (MCMC) sampling. We derive two Gibbs sampler to infer the BSSR and the BMSSR models and apply them on simulated surfaces and a real problem of handwritten digit recognition using the MNIST data set. The obtained results highlight the potential benefit of the proposed Bayesian approaches for modeling surfaces possibly dispersed in particular in clusters.
An important goal of environmental health research is to assess the risk posed by mixtures of environmental exposures. Two popular classes of models for mixtures analyses are response-surface methods and exposure-index methods. Response-surface metho ds estimate high-dimensional surfaces and are thus highly flexible but difficult to interpret. In contrast, exposure-index methods decompose coefficients from a linear model into an overall mixture effect and individual index weights; these models yield easily interpretable effect estimates and efficient inferences when model assumptions hold, but, like most parsimonious models, incur bias when these assumptions do not hold. In this paper we propose a Bayesian multiple index model framework that combines the strengths of each, allowing for non-linear and non-additive relationships between exposure indices and a health outcome, while reducing the dimensionality of the exposure vector and estimating index weights with variable selection. This framework contains response-surface and exposure-index models as special cases, thereby unifying the two analysis strategies. This unification increases the range of models possible for analyzing environmental mixtures and health, allowing one to select an appropriate analysis from a spectrum of models varying in flexibility and interpretability. In an analysis of the association between telomere length and 18 organic pollutants in the National Health and Nutrition Examination Survey (NHANES), the proposed approach fits the data as well as more complex response-surface methods and yields more interpretable results.
For two vast families of mixture distributions and a given prior, we provide unified representations of posterior and predictive distributions. Model applications presented include bivariate mixtures of Gamma distributions labelled as Kibble-type, no n-central Chi-square and F distributions, the distribution of $R^2$ in multiple regression, variance mixture of normal distributions, and mixtures of location-scale exponential distributions including the multivariate Lomax distribution. An emphasis is also placed on analytical representations and the relationships with a host of existing distributions and several hypergeomtric functions of one or two variables.
Weakly stationary Gaussian processes (GPs) are the principal tool in the statistical approaches to the design and analysis of computer experiments (or Uncertainty Quantification). Such processes are fitted to computer model output using a set of trai ning runs to learn the parameters of the process covariance kernel. The stationarity assumption is often adequate, yet can lead to poor predictive performance when the model response exhibits nonstationarity, for example, if its smoothness varies across the input space. In this paper, we introduce a diagnostic-led approach to fitting nonstationary GP emulators by specifying finite mixtures of region-specific covariance kernels. Our method first fits a stationary GP and, if traditional diagnostics exhibit nonstationarity, those diagnostics are used to fit appropriate mixing functions for a covariance kernel mixture designed to capture the nonstationarity, ensuring an emulator that is continuous in parameter space and readily interpretable. We compare our approach to the principal nonstationary GP models in the literature and illustrate its performance on a number of idealised test cases and in an application to modelling the cloud parameterization of the French climate model.
Mixtures-of-Experts (MoE) are conditional mixture models that have shown their performance in modeling heterogeneity in data in many statistical learning approaches for prediction, including regression and classification, as well as for clustering. T heir estimation in high-dimensional problems is still however challenging. We consider the problem of parameter estimation and feature selection in MoE models with different generalized linear experts models, and propose a regularized maximum likelihood estimation that efficiently encourages sparse solutions for heterogeneous data with high-dimensional predictors. The developed proximal-Newton EM algorithm includes proximal Newton-type procedures to update the model parameter by monotonically maximizing the objective function and allows to perform efficient estimation and feature selection. An experimental study shows the good performance of the algorithms in terms of recovering the actual sparse solutions, parameter estimation, and clustering of heterogeneous regression data, compared to the main state-of-the art competitors.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا