ترغب بنشر مسار تعليمي؟ اضغط هنا

Spike-and-Slab Group Lasso for Consistent Estimation and Variable Selection in Non-Gaussian Generalized Additive Models

89   0   0.0 ( 0 )
 نشر من قبل Ray Bai
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English
 تأليف Ray Bai




اسأل ChatGPT حول البحث

We study estimation and variable selection in non-Gaussian Bayesian generalized additive models (GAMs) under a spike-and-slab prior for grouped variables. Our framework subsumes GAMs for logistic regression, Poisson regression, negative binomial regression, and gamma regression, and encompasses both canonical and non-canonical link functions. Under mild conditions, we establish posterior contraction rates and model selection consistency when $p gg n$. For computation, we propose an EM algorithm for obtaining MAP estimates in our model, which is available in the R package sparseGAM. We illustrate our method on both synthetic and real data sets.



قيم البحث

اقرأ أيضاً

We introduce the spike-and-slab group lasso (SSGL) for Bayesian estimation and variable selection in linear regression with grouped variables. We further extend the SSGL to sparse generalized additive models (GAMs), thereby introducing the first nonp arametric variant of the spike-and-slab lasso methodology. Our model simultaneously performs group selection and estimation, while our fully Bayes treatment of the mixture proportion allows for model complexity control and automatic self-adaptivity to different levels of sparsity. We develop theory to uniquely characterize the global posterior mode under the SSGL and introduce a highly efficient block coordinate ascent algorithm for maximum a posteriori (MAP) estimation. We further employ de-biasing methods to provide uncertainty quantification of our estimates. Thus, implementation of our model avoids the computational intensiveness of Markov chain Monte Carlo (MCMC) in high dimensions. We derive posterior concentration rates for both grouped linear regression and sparse GAMs when the number of covariates grows at nearly exponential rate with sample size. Finally, we illustrate our methodology through extensive simulations and data analysis.
We propose a Bayesian procedure for simultaneous variable and covariance selection using continuous spike-and-slab priors in multivariate linear regression models where q possibly correlated responses are regressed onto p predictors. Rather than rely ing on a stochastic search through the high-dimensional model space, we develop an ECM algorithm similar to the EMVS procedure of Rockova & George (2014) targeting modal estimates of the matrix of regression coefficients and residual precision matrix. Varying the scale of the continuous spike densities facilitates dynamic posterior exploration and allows us to filter out negligible regression coefficients and partial covariances gradually. Our method is seen to substantially outperform regularization competitors on simulated data. We demonstrate our method with a re-examination of data from a recent observational study of the effect of playing high school football on several later-life cognition, psychological, and socio-economic outcomes.
High-dimensional data sets have become ubiquitous in the past few decades, often with many more covariates than observations. In the frequentist setting, penalized likelihood methods are the most popular approach for variable selection and estimation in high-dimensional data. In the Bayesian framework, spike-and-slab methods are commonly used as probabilistic constructs for high-dimensional modeling. Within the context of linear regression, Rockova and George (2018) introduced the spike-and-slab LASSO (SSL), an approach based on a prior which provides a continuum between the penalized likelihood LASSO and the Bayesian point-mass spike-and-slab formulations. Since its inception, the spike-and-slab LASSO has been extended to a variety of contexts, including generalized linear models, factor analysis, graphical models, and nonparametric regression. The goal of this paper is to survey the landscape surrounding spike-and-slab LASSO methodology. First we elucidate the attractive properties and the computational tractability of SSL priors in high dimensions. We then review methodological developments of the SSL and outline several theoretical developments. We illustrate the methodology on both simulated and real datasets.
An important task in building regression models is to decide which regressors should be included in the final model. In a Bayesian approach, variable selection can be performed using mixture priors with a spike and a slab component for the effects su bject to selection. As the spike is concentrated at zero, variable selection is based on the probability of assigning the corresponding regression effect to the slab component. These posterior inclusion probabilities can be determined by MCMC sampling. In this paper we compare the MCMC implementations for several spike and slab priors with regard to posterior inclusion probabilities and their sampling efficiency for simulated data. Further, we investigate posterior inclusion probabilities analytically for different slabs in two simple settings. Application of variable selection with spike and slab priors is illustrated on a data set of psychiatric patients where the goal is to identify covariates affecting metabolism.
The impracticality of posterior sampling has prevented the widespread adoption of spike-and-slab priors in high-dimensional applications. To alleviate the computational burden, optimization strategies have been proposed that quickly find local poster ior modes. Trading off uncertainty quantification for computational speed, these strategies have enabled spike-and-slab deployments at scales that would be previously unfeasible. We build on one recent development in this strand of work: the Spike-and-Slab LASSO procedure of Rov{c}kov{a} and George (2018). Instead of optimization, however, we explore multiple avenues for posterior sampling, some traditional and some new. Intrigued by the speed of Spike-and-Slab LASSO mode detection, we explore the possibility of sampling from an approximate posterior by performing MAP optimization on many independently perturbed datasets. To this end, we explore Bayesian bootstrap ideas and introduce a new class of jittered Spike-and-Slab LASSO priors with random shrinkage targets. These priors are a key constituent of the Bayesian Bootstrap Spike-and-Slab LASSO (BB-SSL) method proposed here. BB-SSL turns fast optimization into approximate posterior sampling. Beyond its scalability, we show that BB-SSL has a strong theoretical support. Indeed, we find that the induced pseudo-posteriors contract around the truth at a near-optimal rate in sparse normal-means and in high-dimensional regression. We compare our algorithm to the traditional Stochastic Search Variable Selection (under Laplace priors) as well as many state-of-the-art methods for shrinkage priors. We show, both in simulations and on real data, that our method fares superbly in these comparisons, often providing substantial computational gains.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا