بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Variable Selection Incorporating Prior Constraint Information into Lasso

295 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shurong Zheng

تاريخ النشر 2007

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Shurong Zheng - Guodong Song - Ning-Zhong Shi

المنهجية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose the variable selection procedure incorporating prior constraint information into lasso. The proposed procedure combines the sample and prior information, and selects significant variables for responses in a narrower region where the true parameters lie. It increases the efficiency to choose the true model correctly. The proposed procedure can be executed by many constrained quadratic programming methods and the initial estimator can be found by least square or Monte Carlo method. The proposed procedure also enjoys good theoretical properties. Moreover, the proposed procedure is not only used for linear models but also can be used for generalized linear models({sl GLM}), Cox models, quantile regression models and many others with the help of Wang and Leng (2007)s LSA, which changes these models as the approximation of linear models. The idea of combining sample and prior constraint information can be also used for other modified lasso procedures. Some examples are used for illustration of the idea of incorporating prior constraint information in variable selection procedures.

قيم البحث

72 - Sameer K. Deshpande , Veronika Rockova , Edward I. George 2017

We propose a Bayesian procedure for simultaneous variable and covariance selection using continuous spike-and-slab priors in multivariate linear regression models where q possibly correlated responses are regressed onto p predictors. Rather than rely ing on a stochastic search through the high-dimensional model space, we develop an ECM algorithm similar to the EMVS procedure of Rockova & George (2014) targeting modal estimates of the matrix of regression coefficients and residual precision matrix. Varying the scale of the continuous spike densities facilitates dynamic posterior exploration and allows us to filter out negligible regression coefficients and partial covariances gradually. Our method is seen to substantially outperform regularization competitors on simulated data. We demonstrate our method with a re-examination of data from a recent observational study of the effect of playing high school football on several later-life cognition, psychological, and socio-economic outcomes.

المنهجية

Spike-and-Slab Group Lasso for Consistent Estimation and Variable Selection in Non-Gaussian Generalized Additive Models

88 - Ray Bai 2020

We study estimation and variable selection in non-Gaussian Bayesian generalized additive models (GAMs) under a spike-and-slab prior for grouped variables. Our framework subsumes GAMs for logistic regression, Poisson regression, negative binomial regr ession, and gamma regression, and encompasses both canonical and non-canonical link functions. Under mild conditions, we establish posterior contraction rates and model selection consistency when $p gg n$. For computation, we propose an EM algorithm for obtaining MAP estimates in our model, which is available in the R package sparseGAM. We illustrate our method on both synthetic and real data sets.

المنهجية

Tractable Post-Selection Maximum Likelihood Inference for the Lasso

132 - Amit Meir , Mathias Drton 2017

Applying standard statistical methods after model selection may yield inefficient estimators and hypothesis tests that fail to achieve nominal type-I error rates. The main issue is the fact that the post-selection distribution of the data differs fro m the original distribution. In particular, the observed data is constrained to lie in a subset of the original sample space that is determined by the selected model. This often makes the post-selection likelihood of the observed data intractable and maximum likelihood inference difficult. In this work, we get around the intractable likelihood by generating noisy unbiased estimates of the post-selection score function and using them in a stochastic ascent algorithm that yields correct post-selection maximum likelihood estimates. We apply the proposed technique to the problem of estimating linear models selected by the lasso. In an asymptotic analysis the resulting estimates are shown to be consistent for the selected parameters and to have a limiting truncated normal distribution. Confidence intervals constructed based on the asymptotic distribution obtain close to nominal coverage rates in all simulation settings considered, and the point estimates are shown to be superior to the lasso estimates when the true model is sparse.

المنهجية

Incorporating Knowledge into Structural Equation Models using Auxiliary Variables

104 - Bryant Chen , Judea Pearl , Elias Bareinboim 2015

In this paper, we extend graph-based identification methods by allowing background knowledge in the form of non-zero parameter values. Such information could be obtained, for example, from a previously conducted randomized experiment, from substantiv e understanding of the domain, or even an identification technique. To incorporate such information systematically, we propose the addition of auxiliary variables to the model, which are constructed so that certain paths will be conveniently cancelled. This cancellation allows the auxiliary variables to help conventional methods of identification (e.g., single-door criterion, instrumental variables, half-trek criterion), as well as model testing (e.g., d-separation, over-identification). Moreover, by iteratively alternating steps of identification and adding auxiliary variables, we can improve the power of existing identification methods via a bootstrapping approach that does not require external knowledge. We operationalize this method for simple instrumental sets (a generalization of instrumental variables) and show that the resulting method is able to identify at least as many models as the most general identification method for linear systems known to date. We further discuss the application of auxiliary variables to the tasks of model testing and z-identification.

المنهجية الذكاء الاصطناعي

Stabilizing Variable Selection and Regression

102 - Niklas Pfister , Evan G. Williams , Jonas Peters 2019

We consider regression in which one predicts a response $Y$ with a set of predictors $X$ across different experiments or environments. This is a common setup in many data-driven scientific fields and we argue that statistical inference can benefit fr om an analysis that takes into account the distributional changes across environments. In particular, it is useful to distinguish between stable and unstable predictors, i.e., predictors which have a fixed or a changing functional dependence on the response, respectively. We introduce stabilized regression which explicitly enforces stability and thus improves generalization performance to previously unseen environments. Our work is motivated by an application in systems biology. Using multiomic data, we demonstrate how hypothesis generation about gene function can benefit from stabilized regression. We believe that a similar line of arguments for exploiting heterogeneity in data can be powerful for many other applications as well. We draw a theoretical connection between multi-environment regression and causal models, which allows to graphically characterize stable versus unstable functional dependence on the response. Formally, we introduce the notion of a stable blanket which is a subset of the predictors that lies between the direct causal predictors and the Markov blanket. We prove that this set is optimal in the sense that a regression based on these predictors minimizes the mean squared prediction error given that the resulting regression generalizes to unseen new environments.

المنهجية تطبيقات الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة وهران احمد بن بله

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Variable Selection Incorporating Prior Constraint Information into Lasso

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً