Assessing variable activity for Bayesian regression trees

65 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Akira Horiguchi

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Akira Horiguchi Department of Statistics

المنهجية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Bayesian Additive Regression Trees (BART) are non-parametric models that can capture complex exogenous variable effects. In any regression problem, it is often of interest to learn which variables are most active. Variable activity in BART is usually measured by counting the number of times a tree splits for each variable. Such one-way counts have the advantage of fast computations. Despite their convenience, one-way counts have several issues. They are statistically unjustified, cannot distinguish between main effects and interaction effects, and become inflated when measuring interaction effects. An alternative method well-established in the literature is Sobol indices, a variance-based global sensitivity analysis technique. However, these indices often require Monte Carlo integration, which can be computationally expensive. This paper provides analytic expressions for Sobol indices for BART posterior samples. These expressions are easy to interpret and are computationally feasible. Furthermore, we will show a fascinating connection between first-order (main-effects) Sobol indices and one-way counts. We also introduce a novel ranking method, and use this to demonstrate that the proposed indices preserve the Sobol-based rank order of variable importance. Finally, we compare these methods using analytic test functions and the En-ROADS climate impacts simulator.

قيم البحث

222 - Hugh A. Chipman , Edward I. George , Robert E. McCulloch 2010

We develop a Bayesian sum-of-trees model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a post erior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BARTs many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.

المنهجية تطبيقات الإحصاء التعلم الالي

On Soft Bayesian Additive Regression Trees and asynchronous longitudinal regression analysis

93 - Hao Ran , Yang Bai 2021

In many longitudinal studies, the covariate and response are often intermittently observed at irregular, mismatched and subject-specific times. How to deal with such data when covariate and response are observed asynchronously is an often raised prob lem. Bayesian Additive Regression Trees(BART) is a Bayesian non-Parametric approach which has been shown to be competitive with the best modern predictive methods such as random forest and boosted decision trees. The sum of trees structure combined with a Bayesian inferential framework provide a accurate and robust statistic method. BART variant soft Bayesian Additive Regression Trees(SBART) constructed using randomized decision trees was developed and substantial theoretical and practical benefits were shown. In this paper, we propose a weighted SBART model solution for asynchronous longitudinal data. In comparison to other methods, the current methods are valid under with little assumptions on the covariate process. Extensive simulation studies provide numerical support for this solution. And data from an HIV study is used to illustrate our methodology

المنهجية تطبيقات الإحصاء

Nonparametric competing risks analysis using Bayesian Additive Regression Trees (BART)

134 - Rodney Sparapani , Brent R. Logan , Robert E. McCulloch 2018

Many time-to-event studies are complicated by the presence of competing risks. Such data are often analyzed using Cox models for the cause specific hazard function or Fine-Gray models for the subdistribution hazard. In practice regression relationshi ps in competing risks data with either strategy are often complex and may include nonlinear functions of covariates, interactions, high-dimensional parameter spaces and nonproportional cause specific or subdistribution hazards. Model misspecification can lead to poor predictive performance. To address these issues, we propose a novel approach to flexible prediction modeling of competing risks data using Bayesian Additive Regression Trees (BART). We study the simulation performance in two-sample scenarios as well as a complex regression setting, and benchmark its performance against standard regression techniques as well as random survival forests. We illustrate the use of the proposed method on a recently published study of patients undergoing hematopoietic stem cell transplantation.

المنهجية تطبيقات الإحصاء

Bayesian Variable Selection for Linear Regression with the $kappa$-$G$ Priors

251 - Zichen Ma , Ernest Fokoue 2015

In this paper, we introduce a new methodology for Bayesian variable selection in linear regression that is independent of the traditional indicator method. A diagonal matrix $mathbf{G}$ is introduced to the prior of the coefficient vector $boldsymbol {beta}$, with each of the $g_j$s, bounded between $0$ and $1$, on the diagonal serves as a stabilizer of the corresponding $beta_j$. Mathematically, a promising variable has a $g_j$ value that is close to $0$, whereas the value of $g_j$ corresponding to an unpromising variable is close to $1$. This property is proven in this paper under orthogonality together with other asymptotic properties. Computationally, the sample path of each $g_j$ is obtained through Metropolis-within-Gibbs sampling method. Also, in this paper we give two simulations to verify the capability of this methodology in variable selection.

المنهجية

Bayesian sparse multiple regression for simultaneous rank reduction and variable selection

120 - Antik Chakraborty , Anirban Bhattacharya , Bani K. Mallick 2016

We develop a Bayesian methodology aimed at simultaneously estimating low-rank and row-sparse matrices in a high-dimensional multiple-response linear regression model. We consider a carefully devised shrinkage prior on the matrix of regression coeffic ients which obviates the need to specify a prior on the rank, and shrinks the regression matrix towards low-rank and row-sparse structures. We provide theoretical support to the proposed methodology by proving minimax optimality of the posterior mean under the prediction risk in ultra-high dimensional settings where the number of predictors can grow sub-exponentially relative to the sample size. A one-step post-processing scheme induced by group lasso penalties on the rows of the estimated coefficient matrix is proposed for variable selection, with default choices of tuning parameters. We additionally provide an estimate of the rank using a novel optimization function achieving dimension reduction in the covariate space. We exhibit the performance of the proposed methodology in an extensive simulation study and a real data example.

المنهجية نظرية الإحصاء نظرية الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الإتحاد الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Assessing variable activity for Bayesian regression trees

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً