ROC-Guided Survival Trees and Ensembles

95 0 0.0 ( 0 )

Download Cite

Added by Yifei Sun

Publication date 2018

fields Mathematical Statistics

and research's language is English

Authors Yifei Sun - Sy Han Chiou - Mei-Cheng Wang

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Tree-based methods are popular nonparametric tools in studying time-to-event outcomes. In this article, we introduce a novel framework for survival trees and ensembles, where the trees partition the dynamic survivor population and can handle time-dependent covariates. Using the idea of randomized tests, we develop generalized time-dependent Receiver Operating Characteristic (ROC) curves for evaluating the performance of survival trees. The tree-building algorithm is guided by decision-theoretic criteria based on ROC, targeting specifically for prediction accuracy. To address the instability issue of a single tree, we propose a novel ensemble procedure based on averaging martingale estimating equations, which is different from existing methods that average the predicted survival or cumulative hazard functions from individual trees. Extensive simulation studies are conducted to examine the performance of the proposed methods. We apply the methods to a study on AIDS for illustration.

rate research

Regression Trees and Ensembles for Cumulative Incidence Functions

60 - Youngjoo Cho , Annette M. Molinaro , Chen Hu 2021

The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past decade. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and related ensemble methods, have begun comparatively recently. In this paper, we propose a novel approach to estimating cumulative incidence curves in a competing risks setting using regression trees and associated ensemble estimators. The proposed methods employ augmented estimators of the Brier score risk as the primary basis for building and pruning trees, and lead to methods that are easily implemented using existing R packages. Data from the Radiation Therapy Oncology Group (trial 9410) is used to illustrate these new methods.

Methodology

Uncertainty Quantification in Ensembles of Honest Regression Trees using Generalized Fiducial Inference

359 - Suofei Wu , Jan Hannig , Thomas C. M. Lee 2019

Due to their accuracies, methods based on ensembles of regression trees are a popular approach for making predictions. Some common examples include Bayesian additive regression trees, boosting and random forests. This paper focuses on honest random forests, which add honesty to the original form of random forests and are proved to have better statistical properties. The main contribution is a new method that quantifies the uncertainties of the estimates and predictions produced by honest random forests. The proposed method is based on the generalized fiducial methodology, and provides a fiducial density function that measures how likely each single honest tree is the true model. With such a density function, estimates and predictions, as well as their confidence/prediction intervals, can be obtained. The promising empirical properties of the proposed method are demonstrated by numerical comparisons with several state-of-the-art methods, and by applications to a few real data sets. Lastly, the proposed method is theoretically backed up by a strong asymptotic guarantee.

Methodology Statistics Theory Machine Learning

Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristics (NP-ROC)

82 - Xin Tong , Yang Feng , Jingyi Jessica Li 2016

In many binary classification applications such as disease diagnosis and spam detection, practitioners often face great needs to control type I errors (i.e., the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, $alpha$, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than $alpha$ do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than $alpha$. As a result, the NP paradigm has not been properly implemented for many classification scenarios in practice. In this work, we develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, including popular methods such as logistic regression, support vector machines and random forests. Powered by this umbrella algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands, motivated by the popular receiver operating characteristic (ROC) curves. NP-ROC bands will help choose $alpha$ in a data adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data case studies.

Methodology

Positive-Unlabelled Survival Data Analysis

279 - Tomoki Toyabe , Yasuhiro Hasegawa , 2020

In this paper, we consider a novel framework of positive-unlabeled data in which as positive data survival times are observed for subjects who have events during the observation time as positive data and as unlabeled data censoring times are observed but whether the event occurs or not are unknown for some subjects. We consider two cases: (1) when censoring time is observed in positive data, and (2) when it is not observed. For both cases, we developed parametric models, nonparametric models, and machine learning models and the estimation strategies for these models. Simulation studies show that under this data setup, traditional survival analysis may yield severely biased results, while the proposed estimation method can provide valid results.

Methodology Machine Learning

Adversarial and Amiable Inference in Medical Diagnosis, Reliability, and Survival Analysis

68 - Nozer D. Singpurwalla , Barry C. Arnold , Joseph L. Gastwirth 2017

In this paper, we develop a family of bivariate beta distributions that encapsulate both positive and negative correlations, and which can be of general interest for Bayesian inference. We then invoke a use of these bivariate distributions in two contexts. The first is diagnostic testing in medicine, threat detection, and signal processing. The second is system survivability assessment, relevant to engineering reliability, and to survival analysis in biomedicine. In diagnostic testing one encounters two parameters that characterize the efficacy of the testing mechanism, {it test sensitivity}, and {it test specificity}. These tend to be adversarial when their values are interpreted as utilities. In system survivability, the parameters of interest are the component reliabilities, whose values when interpreted as utilities tend to exhibit co-operative (amiable) behavior. Besides probability modeling and Bayesian inference, this paper has a foundational import. Specifically, it advocates a conceptual change in how one may think about reliability and survival analysis. The philosophical writings of de Finetti, Kolmogorov, Popper, and Savage, when brought to bear on these topics constitute the essence of this change. Its consequence is that we have at hand a defensible framework for invoking Bayesian inferential methods in diagnostics, reliability, and survival analysis. Another consequence is a deeper appreciation of the judgment of independent lifetimes. Specifically, we make the important point that independent lifetimes entail at a minimum, a two-stage hierarchical construction.

Methodology