ترغب بنشر مسار تعليمي؟ اضغط هنا

Bayesian Mixture Models With Focused Clustering for Mixed Ordinal and Nominal Data

64   0   0.0 ( 0 )
 نشر من قبل Maria DeYoreo
 تاريخ النشر 2015
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

In some contexts, mixture models can fit certain variables well at the expense of others in ways beyond the analysts control. For example, when the data include some variables with non-trivial amounts of missing values, the mixture model may fit the marginal distributions of the nearly and fully complete variables at the expense of the variables with high fractions of missing data. Motivated by this setting, we present a mixture model for mixed ordinal and nominal data that splits variables into two groups, focus variables and remainder variables. The model allows the analyst to specify a rich sub-model for the focus variables and a simpler sub-model for remainder variables, yet still capture associations among the variables. Using simulations, we illustrate advantages and limitations of focused clustering compared to mixture models that do not distinguish variables. We apply the model to handle missing values in an analysis of the 2012 American National Election Study, estimating relationships among voting behavior, ideology, and political party affiliation.

قيم البحث

اقرأ أيضاً

85 - Emily Roberts , Lili Zhao 2020
In regression models, predictor variables with inherent ordering, such as tumor staging ranging and ECOG performance status, are commonly seen in medical settings. Statistically, it may be difficult to determine the functional form of an ordinal pred ictor variable. Often, such a variable is dichotomized based on whether it is above or below a certain cutoff. Other methods conveniently treat the ordinal predictor as a continuous variable and assume a linear relationship with the outcome. However, arbitrarily choosing a method may lead to inaccurate inference and treatment. In this paper, we propose a Bayesian mixture model to simultaneously assess the appropriate form of the predictor in regression models by considering the presence of a changepoint through the lens of a threshold detection problem. By using a mixture model framework to consider both dichotomous and linear forms for the variable, the estimate is a weighted average of linear and binary parameterizations. This method is applicable to continuous, binary, and survival outcomes, and easily amenable to penalized regression. We evaluated the proposed method using simulation studies and apply it to two real datasets. We provide JAGS code for easy implementation.
We develop Bayesian nonparametric models for spatially indexed data of mixed type. Our work is motivated by challenges that occur in environmental epidemiology, where the usual presence of several confounding variables that exhibit complex interactio ns and high correlations makes it difficult to estimate and understand the effects of risk factors on health outcomes of interest. The modeling approach we adopt assumes that responses and confounding variables are manifestations of continuous latent variables, and uses multivariate Gaussians to jointly model these. Responses and confounding variables are not treated equally as relevant parameters of the distributions of the responses only are modeled in terms of explanatory variables or risk factors. Spatial dependence is introduced by allowing the weights of the nonparametric process priors to be location specific, obtained as probit transformations of Gaussian Markov random fields. Confounding variables and spatial configuration have a similar role in the model, in that they only influence, along with the responses, the allocation probabilities of the areas into the mixture components, thereby allowing for flexible adjustment of the effects of observed confounders, while allowing for the possibility of residual spatial structure, possibly occurring due to unmeasured or undiscovered spatially varying factors. Aspects of the model are illustrated in simulation studies and an application to a real data set.
The problem of multimodal clustering arises whenever the data are gathered with several physically different sensors. Observations from different modalities are not necessarily aligned in the sense there there is no obvious way to associate or to com pare them in some common space. A solution may consist in considering multiple clustering tasks independently for each modality. The main difficulty with such an approach is to guarantee that the unimodal clusterings are mutually consistent. In this paper we show that multimodal clustering can be addressed within a novel framework, namely conjugate mixture models. These models exploit the explicit transformations that are often available between an unobserved parameter space (objects) and each one of the observation spaces (sensors). We formulate the problem as a likelihood maximization task and we derive the associated conjugate expectation-maximization algorithm. The convergence properties of the proposed algorithm are thoroughly investigated. Several local/global optimization techniques are proposed in order to increase its convergence speed. Two initialization strategies are proposed and compared. A consistent model-selection criterion is proposed. The algorithm and its variants are tested and evaluated within the task of 3D localization of several speakers using both auditory and visual data.
We develop clustering procedures for longitudinal trajectories based on a continuous-time hidden Markov model (CTHMM) and a generalized linear observation model. Specifically in this paper, we carry out finite and infinite mixture model-based cluster ing for a CTHMM and achieve inference using Markov chain Monte Carlo (MCMC). For a finite mixture model with prior on the number of components, we implement reversible-jump MCMC to facilitate the trans-dimensional move between different number of clusters. For a Dirichlet process mixture model, we utilize restricted Gibbs sampling split-merge proposals to expedite the MCMC algorithm. We employ proposed algorithms to the simulated data as well as a real data example, and the results demonstrate the desired performance of the new sampler.
Clustering task of mixed data is a challenging problem. In a probabilistic framework, the main difficulty is due to a shortage of conventional distributions for such data. In this paper, we propose to achieve the mixed data clustering with a Gaussian copula mixture model, since copulas, and in particular the Gaussian ones, are powerful tools for easily modelling the distribution of multivariate variables. Indeed, considering a mixing of continuous, integer and ordinal variables (thus all having a cumulative distribution function), this copula mixture model defines intra-component dependencies similar to a Gaussian mixture, so with classical correlation meaning. Simultaneously, it preserves standard margins associated to continuous, integer and ordered features, namely the Gaussian, the Poisson and the ordered multinomial distributions. As an interesting by-product, the proposed mixture model generalizes many well-known ones and also provides tools of visualization based on the parameters. At a practical level, the Bayesian inference is retained and it is achieved with a Metropolis-within-Gibbs sampler. Experiments on simulated and real data sets finally illustrate the expected advantages of the proposed model for mixed data: flexible and meaningful parametrization combined with visualization features.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا