Survival ensembles by the sum of pairwise differences with application to lung cancer microarray studies

517 0 0.0 ( 0 )

Download Cite

Added by Brent A. Johnson

Publication date 2011

fields Mathematical Statistics

and research's language is English

Authors Brent A. Johnson - Qi Long

Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Lung cancer is among the most common cancers in the United States, in terms of incidence and mortality. In 2009, it is estimated that more than 150,000 deaths will result from lung cancer alone. Genetic information is an extremely valuable data source in characterizing the personal nature of cancer. Over the past several years, investigators have conducted numerous association studies where intensive genetic data is collected on relatively few patients compared to the numbers of gene predictors, with one scientific goal being to identify genetic features associated with cancer recurrence or survival. In this note, we propose high-dimensional survival analysis through a new application of boosting, a powerful tool in machine learning. Our approach is based on an accelerated lifetime model and minimizing the sum of pairwise differences in residuals. We apply our method to a recent microarray study of lung adenocarcinoma and find that our ensemble is composed of 19 genes, while a proportional hazards (PH) ensemble is composed of nine genes, a proper subset of the 19-gene panel. In one of our simulation scenarios, we demonstrate that PH boosting in a misspecified model tends to underfit and ignore moderately-sized covariate effects, on average. Diagnostic analyses suggest that the PH assumption is not satisfied in the microarray data and may explain, in part, the discrepancy in the sets of active coefficients. Our simulation studies and comparative data analyses demonstrate how statistical learning by PH models alone is insufficient.

rate research

Dynamic Risk Prediction Using Survival Tree Ensembles with Application to Cystic Fibrosis

134 - Yifei Sun , Sy Han Chiou , Colin O. Wu 2020

With the availability of massive amounts of data from electronic health records and registry databases, incorporating time-varying patient information to improve risk prediction has attracted great attention. To exploit the growing amount of predictor information over time, we develop a unified framework for landmark prediction using survival tree ensembles, where an updated prediction can be performed when new information becomes available. Compared to the conventional landmark prediction, our framework enjoys great flexibility in that the landmark times can be subject-specific and triggered by an intermediate clinical event. Moreover, the nonparametric approach circumvents the thorny issue in model incompatibility at different landmark times. When both the longitudinal predictors and the outcome event time are subject to right censoring, existing tree-based approaches cannot be directly applied. To tackle the analytical challenges, we consider a risk-set-based ensemble procedure by averaging martingale estimating equations from individual trees. Extensive simulation studies are conducted to evaluate the performance of our methods. The methods are applied to the Cystic Fibrosis Patient Registry (CFFPR) data to perform dynamic prediction of lung disease in cystic fibrosis patients and to identify important prognosis factors.

Applications

Pairwise comparison of treatment levels in functional analysis of variance with application to erythrocyte hemolysis

551 - Olga Vsevolozhskaya , Mark Greenwood , Dmitri Holodov 2014

Motivated by a practical need for the comparison of hemolysis curves at various treatment levels, we propose a novel method for pairwise comparison of mean functional responses. The hemolysis curves - the percent hemolysis as a function of time - of mice erythrocytes (red blood cells) by hydrochloric acid have been measured among different treatment levels. This data set fits well within the functional data analysis paradigm, in which a time series is considered as a realization of the underlying stochastic process or a smooth curve. Previous research has only provided methods for identifying some differences in mean curves at different times. We propose a two-level follow-up testing framework to allow comparisons of pairs of treatments within regions of time where some difference among curves is identified. The closure multiplicity adjustment method is used to control the family-wise error rate of the proposed procedure.

Applications

Sensitivity Analysis of Treatment Effect to Unmeasured Confounding in Observational Studies with Survival and Competing Risks Outcomes

131 - Rong Huang , Ronghui Xu , Parambir S. Dulai 2019

No unmeasured confounding is often assumed in estimating treatment effects in observational data when using approaches such as propensity scores and inverse probability weighting. However, in many such studies due to the limitation of the databases, collected confounders are not exhaustive, and it is crucial to examine the extent to which the resulting estimate is sensitive to the unmeasured confounders. We consider this problem for survival and competing risks data. Due to the complexity of models for such data, we adapt the simulated potential confounders approach of Carnegie et al. (2016), which provides a general tool for sensitivity analysis due to unmeasured confounding. More specifically, we specify one sensitivity parameter to quantify the association between an unmeasured confounder and the treatment assignment, and another set of parameters to quantify the association between the confounder and the time-to-event outcomes. By varying the magnitudes of the sensitivity parameters, we estimate the treatment effect of interest using the stochastic EM and the EM algorithms. We demonstrate the performance of our methods on simulated data, and apply them to a comparative effectiveness study in inflammatory bowel disease (IBD).

Applications

Bayesian Semiparametric Estimation of Cancer-specific Age-at-onset Penetrance with Application to Li-Fraumeni Syndrome

128 - Seung Jun Shin , Ying Yuan , Louise C. Strong 2017

Penetrance, which plays a key role in genetic research, is defined as the proportion of individuals with the genetic variants (i.e., {genotype}) that cause a particular trait and who have clinical symptoms of the trait (i.e., {phenotype}). We propose a Bayesian semiparametric approach to estimate the cancer-specific age-at-onset penetrance in the presence of the competing risk of multiple cancers. We employ a Bayesian semiparametric competing risk model to model the duration until individuals in a high-risk group develop different cancers, and accommodate family data using family-wise likelihoods. We tackle the ascertainment bias arising when family data are collected through probands in a high-risk population in which disease cases are more likely to be observed. We apply the proposed method to a cohort of 186 families with Li-Fraumeni syndrome identified through probands with sarcoma treated at MD Anderson Cancer Center from 1944 to 1982.

Applications

A statistical analysis of memory CD8 T cell differentiation: An application of a hierarchical state space model to a short time course microarray experiment

488 - Haiyan Wu , Ming Yuan , Susan M. Kaech 2007

CD8 T cells are specialized immune cells that play an important role in the regulation of antiviral immune response and the generation of protective immunity. In this paper we investigate the differentiation of memory CD8 T cells in the immune response using a short time course microarray experiment. Structurally, this experiment is similar to many in that it involves measurements taken on independent samples, in one biological group, at a small number of irregularly spaced time points, and exhibiting patterns of temporal nonstationarity. To analyze this CD8 T-cell experiment, we develop a hierarchical state space model so that we can: (1) detect temporally differentially expressed genes, (2) identify the direction of successive changes over time, and (3) assess the magnitude of successive changes over time. We incorporate hidden Markov models into our model to utilize the information embedded in the time series and set up the proposed hierarchical state space model in an empirical Bayes framework to utilize the population information from the large-scale data. Analysis of the CD8 T-cell experiment using the proposed model results in biologically meaningful findings. Temporal patterns involved in the differentiation of memory CD8 T cells are summarized separately and performance of the proposed model is illustrated in a simulation study.

Applications