ترغب بنشر مسار تعليمي؟ اضغط هنا

ROC-Guided Survival Trees and Ensembles

95   0   0.0 ( 0 )
 نشر من قبل Yifei Sun
 تاريخ النشر 2018
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Tree-based methods are popular nonparametric tools in studying time-to-event outcomes. In this article, we introduce a novel framework for survival trees and ensembles, where the trees partition the dynamic survivor population and can handle time-dependent covariates. Using the idea of randomized tests, we develop generalized time-dependent Receiver Operating Characteristic (ROC) curves for evaluating the performance of survival trees. The tree-building algorithm is guided by decision-theoretic criteria based on ROC, targeting specifically for prediction accuracy. To address the instability issue of a single tree, we propose a novel ensemble procedure based on averaging martingale estimating equations, which is different from existing methods that average the predicted survival or cumulative hazard functions from individual trees. Extensive simulation studies are conducted to examine the performance of the proposed methods. We apply the methods to a study on AIDS for illustration.



قيم البحث

اقرأ أيضاً

The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past decade. The problems of modeling, estimation and inference have been treated using para metric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and related ensemble methods, have begun comparatively recently. In this paper, we propose a novel approach to estimating cumulative incidence curves in a competing risks setting using regression trees and associated ensemble estimators. The proposed methods employ augmented estimators of the Brier score risk as the primary basis for building and pruning trees, and lead to methods that are easily implemented using existing R packages. Data from the Radiation Therapy Oncology Group (trial 9410) is used to illustrate these new methods.
Due to their accuracies, methods based on ensembles of regression trees are a popular approach for making predictions. Some common examples include Bayesian additive regression trees, boosting and random forests. This paper focuses on honest random f orests, which add honesty to the original form of random forests and are proved to have better statistical properties. The main contribution is a new method that quantifies the uncertainties of the estimates and predictions produced by honest random forests. The proposed method is based on the generalized fiducial methodology, and provides a fiducial density function that measures how likely each single honest tree is the true model. With such a density function, estimates and predictions, as well as their confidence/prediction intervals, can be obtained. The promising empirical properties of the proposed method are demonstrated by numerical comparisons with several state-of-the-art methods, and by applications to a few real data sets. Lastly, the proposed method is theoretically backed up by a strong asymptotic guarantee.
In many binary classification applications such as disease diagnosis and spam detection, practitioners often face great needs to control type I errors (i.e., the conditional probability of misclassifying a class 0 observation as class 1) so that it r emains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, $alpha$, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than $alpha$ do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than $alpha$. As a result, the NP paradigm has not been properly implemented for many classification scenarios in practice. In this work, we develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, including popular methods such as logistic regression, support vector machines and random forests. Powered by this umbrella algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands, motivated by the popular receiver operating characteristic (ROC) curves. NP-ROC bands will help choose $alpha$ in a data adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data case studies.
In this paper, we consider a novel framework of positive-unlabeled data in which as positive data survival times are observed for subjects who have events during the observation time as positive data and as unlabeled data censoring times are observed but whether the event occurs or not are unknown for some subjects. We consider two cases: (1) when censoring time is observed in positive data, and (2) when it is not observed. For both cases, we developed parametric models, nonparametric models, and machine learning models and the estimation strategies for these models. Simulation studies show that under this data setup, traditional survival analysis may yield severely biased results, while the proposed estimation method can provide valid results.
In this paper, we develop a family of bivariate beta distributions that encapsulate both positive and negative correlations, and which can be of general interest for Bayesian inference. We then invoke a use of these bivariate distributions in two con texts. The first is diagnostic testing in medicine, threat detection, and signal processing. The second is system survivability assessment, relevant to engineering reliability, and to survival analysis in biomedicine. In diagnostic testing one encounters two parameters that characterize the efficacy of the testing mechanism, {it test sensitivity}, and {it test specificity}. These tend to be adversarial when their values are interpreted as utilities. In system survivability, the parameters of interest are the component reliabilities, whose values when interpreted as utilities tend to exhibit co-operative (amiable) behavior. Besides probability modeling and Bayesian inference, this paper has a foundational import. Specifically, it advocates a conceptual change in how one may think about reliability and survival analysis. The philosophical writings of de Finetti, Kolmogorov, Popper, and Savage, when brought to bear on these topics constitute the essence of this change. Its consequence is that we have at hand a defensible framework for invoking Bayesian inferential methods in diagnostics, reliability, and survival analysis. Another consequence is a deeper appreciation of the judgment of independent lifetimes. Specifically, we make the important point that independent lifetimes entail at a minimum, a two-stage hierarchical construction.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا