ترغب بنشر مسار تعليمي؟ اضغط هنا

Distributed Soft Bayesian Additive Regression Trees

114   0   0.0 ( 0 )
 نشر من قبل Hao Ran
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Bayesian Additive Regression Trees(BART) is a Bayesian nonparametric approach which has been shown to be competitive with the best modern predictive methods such as random forest and Gradient Boosting Decision Tree.The sum of trees structure combined with a Bayesian inferential framework provide a accurate and robust statistic method.BART variant named SBART using randomized decision trees has been developed and show practical benefits compared to BART. The primary bottleneck of SBART is the speed to compute the sufficient statistics and the publicly avaiable implementation of the SBART algorithm in the R package is very slow.In this paper we show how the SBART algorithm can be modified and computed using single program,multiple data(SPMD) distributed computation with the Message Passing Interface(MPI) library.This approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.We have made modification to this algorithm to make it capable to handle classfication problem which can not be done with the original R package.With data experiments we show the advantage of distributed SBART for classfication problem compared to BART.

قيم البحث

اقرأ أيضاً

Popular parametric and semiparametric hazards regression models for clustered survival data are inappropriate and inadequate when the unknown effects of different covariates and clustering are complex. This calls for a flexible modeling framework to yield efficient survival prediction. Moreover, for some survival studies involving time to occurrence of some asymptomatic events, survival times are typically interval censored between consecutive clinical inspections. In this article, we propose a robust semiparametric model for clustered interval-censored survival data under a paradigm of Bayesian ensemble learning, called Soft Bayesian Additive Regression Trees or SBART (Linero and Yang, 2018), which combines multiple sparse (soft) decision trees to attain excellent predictive accuracy. We develop a novel semiparametric hazards regression model by modeling the hazard function as a product of a parametric baseline hazard function and a nonparametric component that uses SBART to incorporate clustering, unknown functional forms of the main effects, and interaction effects of various covariates. In addition to being applicable for left-censored, right-censored, and interval-censored survival data, our methodology is implemented using a data augmentation scheme which allows for existing Bayesian backfitting algorithms to be used. We illustrate the practical implementation and advantages of our method via simulation studies and an analysis of a prostate cancer surgery study where dependence on the experience and skill level of the physicians leads to clustering of survival times. We conclude by discussing our methods applicability in studies involving high dimensional data with complex underlying associations.
93 - Hao Ran , Yang Bai 2021
In many longitudinal studies, the covariate and response are often intermittently observed at irregular, mismatched and subject-specific times. How to deal with such data when covariate and response are observed asynchronously is an often raised prob lem. Bayesian Additive Regression Trees(BART) is a Bayesian non-Parametric approach which has been shown to be competitive with the best modern predictive methods such as random forest and boosted decision trees. The sum of trees structure combined with a Bayesian inferential framework provide a accurate and robust statistic method. BART variant soft Bayesian Additive Regression Trees(SBART) constructed using randomized decision trees was developed and substantial theoretical and practical benefits were shown. In this paper, we propose a weighted SBART model solution for asynchronous longitudinal data. In comparison to other methods, the current methods are valid under with little assumptions on the covariate process. Extensive simulation studies provide numerical support for this solution. And data from an HIV study is used to illustrate our methodology
We develop a Bayesian sum-of-trees model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a post erior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BARTs many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.
Bayesian Additive Regression Trees (BART) is a Bayesian approach to flexible non-linear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages . For example, the stochastic search Markov Chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this paper we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.
Many time-to-event studies are complicated by the presence of competing risks. Such data are often analyzed using Cox models for the cause specific hazard function or Fine-Gray models for the subdistribution hazard. In practice regression relationshi ps in competing risks data with either strategy are often complex and may include nonlinear functions of covariates, interactions, high-dimensional parameter spaces and nonproportional cause specific or subdistribution hazards. Model misspecification can lead to poor predictive performance. To address these issues, we propose a novel approach to flexible prediction modeling of competing risks data using Bayesian Additive Regression Trees (BART). We study the simulation performance in two-sample scenarios as well as a complex regression setting, and benchmark its performance against standard regression techniques as well as random survival forests. We illustrate the use of the proposed method on a recently published study of patients undergoing hematopoietic stem cell transplantation.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا