Bayesian Effect Fusion for Categorical Predictors

71 0 0.0 ( 0 )

Download Cite

Added by Helga Wagner Dr.

Publication date 2017

fields Mathematical Statistics

and research's language is English

Authors Daniela Pauger - Helga Wagner

Computation

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we propose a Bayesian approach to obtain a sparse representation of the effect of a categorical predictor in regression type models. As the effect of a categorical predictor is captured by a group of level effects, sparsity cannot only be achieved by excluding single irrelevant level effects but also by excluding the whole group of effects associated to a predictor or by fusing levels which have essentially the same effect on the response. To achieve this goal, we propose a prior which allows for almost perfect as well as almost zero dependence between level effects a priori. We show how this prior can be obtained by specifying spike and slab prior distributions on all effect differences associated to one categorical predictor and how restricted fusion can be implemented. An efficient MCMC method for posterior computation is developed. The performance of the proposed method is investigated on simulated data. Finally, we illustrate its application on real data from EU-SILC.

rate research

Bayesian Fusion of Data Partitioned Particle Estimates

55 - Caleb Miller , Michael D. Schneider , Jem N. Corcoran 2020

We present a Bayesian data fusion method to approximate a posterior distribution from an ensemble of particle estimates that only have access to subsets of the data. Our approach relies on approximate probabilistic inference of model parameters through Monte Carlo methods, followed by an update and resample scheme related to multiple importance sampling to combine information from the initial estimates. We show the method is convergent in the particle limit and directly suited to application on multi-sensor data fusion problems by demonstrating efficacy on a multi-sensor Keplerian orbit determination problem and a bearings-only tracking problem.

Computation

Efficient Bayesian Modeling of Binary and Categorical Data in R: The UPG Package

263 - Gregor Zens , Sylvia Fruhwirth-Schnatter , Helga Wagner 2021

We introduce the UPG package for highly efficient Bayesian inference in probit, logit, multinomial logit and binomial logit models. UPG offers a convenient estimation framework for balanced and imbalanced data settings where sampling efficiency is ensured through Markov chain Monte Carlo boosting methods. All sampling algorithms are implemented in C++, allowing for rapid parameter estimation. In addition, UPG provides several methods for fast production of output tables and summary plots that are easily accessible to a broad range of users.

Computation Methodology

Model-based clustering for conditionally correlated categorical data

122 - Matthieu Marbac , Christophe Biernacki , Vincent Vandewalle 2014

An extension of the latent class model is presented for clustering categorical data by relaxing the classical class conditional independence assumption of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model.

Computation

Ultimate Polya Gamma Samplers -- Efficient MCMC for possibly imbalanced binary and categorical data

117 - Sylvia Fruhwirth-Schnatter , Gregor Zens , Helga Wagner 2020

Modeling binary and categorical data is one of the most commonly encountered tasks of applied statisticians and econometricians. While Bayesian methods in this context have been available for decades now, they often require a high level of familiarity with Bayesian statistics or suffer from issues such as low sampling efficiency. To contribute to the accessibility of Bayesian models for binary and categorical data, we introduce novel latent variable representations based on Polya Gamma random variables for a range of commonly encountered discrete choice models. From these latent variable representations, new Gibbs sampling algorithms for binary, binomial and multinomial logistic regression models are derived. All models allow for a conditionally Gaussian likelihood representation, rendering extensions to more complex modeling frameworks such as state space models straight-forward. However, sampling efficiency may still be an issue in these data augmentation based estimation frameworks. To counteract this, MCMC boosting strategies are developed and discussed in detail. The merits of our approach are illustrated through extensive simulations and a real data application.

Computation Methodology

Adjusted Bayesian inference for selected parameters

195 - Daniel Yekutieli 2011

We address the problem of providing inference from a Bayesian perspective for parameters selected after viewing the data. We present a Bayesian framework for providing inference for selected parameters, based on the observation that providing Bayesian inference for selected parameters is a truncated data problem. We show that if the prior for the parameter is non-informative, or if the parameter is a fixed unknown constant, then it is necessary to adjust the Bayesian inference for selection. Our second contribution is the introduction of Bayesian False Discovery Rate controlling methodology,which generalizes existing Bayesian FDR methods that are only defined in the two-group mixture model.We illustrate our results by applying them to simulated data and data froma microarray experiment.

Computation Methodology