ترغب بنشر مسار تعليمي؟ اضغط هنا

Parallel Bayesian Additive Regression Trees

203   0   0.0 ( 0 )
 نشر من قبل Matthew Pratola
 تاريخ النشر 2013
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Bayesian Additive Regression Trees (BART) is a Bayesian approach to flexible non-linear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov Chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this paper we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.



قيم البحث

اقرأ أيضاً

We develop a Bayesian sum-of-trees model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a post erior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BARTs many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.
113 - Hao Ran , Yang Bai 2021
Bayesian Additive Regression Trees(BART) is a Bayesian nonparametric approach which has been shown to be competitive with the best modern predictive methods such as random forest and Gradient Boosting Decision Tree.The sum of trees structure combined with a Bayesian inferential framework provide a accurate and robust statistic method.BART variant named SBART using randomized decision trees has been developed and show practical benefits compared to BART. The primary bottleneck of SBART is the speed to compute the sufficient statistics and the publicly avaiable implementation of the SBART algorithm in the R package is very slow.In this paper we show how the SBART algorithm can be modified and computed using single program,multiple data(SPMD) distributed computation with the Message Passing Interface(MPI) library.This approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.We have made modification to this algorithm to make it capable to handle classfication problem which can not be done with the original R package.With data experiments we show the advantage of distributed SBART for classfication problem compared to BART.
143 - Hao Ran , Yang Bai 2021
Bayes additive regression trees(BART) is a nonparametric regression model which has gained wide -spread popularity in recent years due to its flexibility and high accuracy of estimation .In spatio-temporal related model,the spatio or temporal variabl es are playing an important role in the model.The BART models select variables with uniform prior distribution that means treat every variable equally.Applying the BART model directly without properly using these prior information is not appropriate.This paper is aimed at a modification to the BART by fixing part of the trees structure.We call this model partially fixed BART.By this new model we can improve efficiency of estimation.When we dont know the prior information,we can still use the new model to get more accurate estimation and more structure information for future use.Data experiments and real data examples show the improvement comparing to the original Bart model.
93 - Hao Ran , Yang Bai 2021
In many longitudinal studies, the covariate and response are often intermittently observed at irregular, mismatched and subject-specific times. How to deal with such data when covariate and response are observed asynchronously is an often raised prob lem. Bayesian Additive Regression Trees(BART) is a Bayesian non-Parametric approach which has been shown to be competitive with the best modern predictive methods such as random forest and boosted decision trees. The sum of trees structure combined with a Bayesian inferential framework provide a accurate and robust statistic method. BART variant soft Bayesian Additive Regression Trees(SBART) constructed using randomized decision trees was developed and substantial theoretical and practical benefits were shown. In this paper, we propose a weighted SBART model solution for asynchronous longitudinal data. In comparison to other methods, the current methods are valid under with little assumptions on the covariate process. Extensive simulation studies provide numerical support for this solution. And data from an HIV study is used to illustrate our methodology
Many time-to-event studies are complicated by the presence of competing risks. Such data are often analyzed using Cox models for the cause specific hazard function or Fine-Gray models for the subdistribution hazard. In practice regression relationshi ps in competing risks data with either strategy are often complex and may include nonlinear functions of covariates, interactions, high-dimensional parameter spaces and nonproportional cause specific or subdistribution hazards. Model misspecification can lead to poor predictive performance. To address these issues, we propose a novel approach to flexible prediction modeling of competing risks data using Bayesian Additive Regression Trees (BART). We study the simulation performance in two-sample scenarios as well as a complex regression setting, and benchmark its performance against standard regression techniques as well as random survival forests. We illustrate the use of the proposed method on a recently published study of patients undergoing hematopoietic stem cell transplantation.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا