High-Dimensional Variable Selection and Prediction under Competing Risks with Application to SEER-Medicare Linked Data

141 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jiayi Hou

تاريخ النشر 2017

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Jiayi Hou - Anthony Paravati - Ronghui Xu

تطبيقات الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Competing risk analysis considers event times due to multiple causes, or of more than one event types. Commonly used regression models for such data include 1) cause-specific hazards model, which focuses on modeling one type of event while acknowledging other event types simultaneously; and 2) subdistribution hazards model, which links the covariate effects directly to the cumulative incidence function. Their use and in particular statistical properties in the presence of high-dimensional predictors are largely unexplored. Motivated by an analysis using the linked SEER-Medicare database for the purposes of predicting cancer versus non-cancer mortality for patients with prostate cancer, we study the accuracy of prediction and variable selection of existing statistical learning methods under both models using extensive simulation experiments, including different approaches to choosing penalty parameters in each method. We then apply the optimal approaches to the analysis of the SEER-Medicare data.

قيم البحث

284 - Andrew Ying , Ronghui Xu , James Murphy 2018

Instrumental variable is an essential tool for addressing unmeasured confounding in observational studies. Two stage predictor substitution (2SPS) estimator and two stage residual inclusion(2SRI) are two commonly used approaches in applying instrumen tal variables. Recently 2SPS was studied under the additive hazards model in the presence of competing risks of time-to-events data, where linearity was assumed for the relationship between the treatment and the instrument variable. This assumption may not be the most appropriate when we have binary treatments. In this paper, we consider the 2SRI estimator under the additive hazards model for general survival data and in the presence of competing risks, which allows generalized linear models for the relation between the treatment and the instrumental variable. We derive the asymptotic properties including a closed-form asymptotic variance estimate for the 2SRI estimator. We carry out numerical studies in finite samples, and apply our methodology to the linked Surveillance, Epidemiology and End Results (SEER) - Medicare database comparing radical prostatectomy versus conservative treatment in early-stage prostate cancer patients.

تطبيقات الإحصاء نظرية الإحصاء نظرية الإحصاء

Bayesian Variable Selection for Multivariate Zero-Inflated Models: Application to Microbiome Count Data

102 - Kyu Ha Lee , Brent A. Coull , Anna-Barbara Moscicki 2017

Microorganisms play critical roles in human health and disease. It is well known that microbes live in diverse communities in which they interact synergistically or antagonistically. Thus for estimating microbial associations with clinical covariates , multivariate statistical models are preferred. Multivariate models allow one to estimate and exploit complex interdependencies among multiple taxa, yielding more powerful tests of exposure or treatment effects than application of taxon-specific univariate analyses. In addition, the analysis of microbial count data requires special attention because data commonly exhibit zero inflation. To meet these needs, we developed a Bayesian variable selection model for multivariate count data with excess zeros that incorporates information on the covariance structure of the outcomes (counts for multiple taxa), while estimating associations with the mean levels of these outcomes. Although there has been a great deal of effort in zero-inflated models for longitudinal data, little attention has been given to high-dimensional multivariate zero-inflated data modeled via a general correlation structure. Through simulation, we compared performance of the proposed method to that of existing univariate approaches, for both the binary and count parts of the model. When outcomes were correlated the proposed variable selection method maintained type I error while boosting the ability to identify true associations in the binary component of the model. For the count part of the model, in some scenarios the the univariate method had higher power than the multivariate approach. This higher power was at a cost of a highly inflated false discovery rate not observed with the proposed multivariate method. We applied the approach to oral microbiome data from the Pediatric HIV/AIDS Cohort Oral Health Study and identified five species (of 44) associated with HIV infection.

تطبيقات الإحصاء

Spatial Variable Selection and An Application to Virginia Lyme Disease Emergence

153 - Yimeng Xie , Li Xu , Jie Li 2018

Lyme disease is an infectious disease that is caused by a bacterium called Borrelia burgdorferi sensu stricto. In the United States, Lyme disease is one of the most common infectious diseases. The major endemic areas of the disease are New England, M id-Atlantic, East-North Central, South Atlantic, and West North-Central. Virginia is on the front-line of the diseases diffusion from the northeast to the south. One of the research objectives for the infectious disease community is to identify environmental and economic variables that are associated with the emergence of Lyme disease. In this paper, we use a spatial Poisson regression model to link the spatial disease counts and environmental and economic variables, and develop a spatial variable selection procedure to effectively identify important factors by using an adaptive elastic net penalty. The proposed methods can automatically select important covariates, while adjusting for possible spatial correlations of disease counts. The performance of the proposed method is studied and compared with existing methods via a comprehensive simulation study. We apply the developed variable selection methods to the Virginia Lyme disease data and identify important variables that are new to the literature. Supplementary materials for this paper are available online.

تطبيقات الإحصاء

Combined tests based on restricted mean time lost for competing risks data

75 - Jingjing Lyu , Yawen Hou , Zheng Chen 2021

Competing risks data are common in medical studies, and the sub-distribution hazard (SDH) ratio is considered an appropriate measure. However, because the limitations of hazard itself are not easy to interpret clinically and because the SDH ratio is valid only under the proportional SDH assumption, this article introduced an alternative index under competing risks, named restricted mean time lost (RMTL). Several test procedures were also constructed based on RMTL. First, we introduced the definition and estimation of RMTL based on Aalen-Johansen cumulative incidence functions. Then, we considered several combined tests based on the SDH and the RMTL difference (RMTLd). The statistical properties of the methods are evaluated using simulations and are applied to two examples. The type I errors of combined tests are close to the nominal level. All combined tests show acceptable power in all situations. In conclusion, RMTL can meaningfully summarize treatment effects for clinical decision making, and three combined tests have robust power under various conditions, which can be considered for statistical inference in real data analysis.

تطبيقات الإحصاء المنهجية

Fine-Gray competing risks model with high-dimensional covariates: estimation and Inference

110 - Jue Hou , Jelena Bradic , Ronghui Xu 2017

The purpose of this paper is to construct confidence intervals for the regression coefficients in the Fine-Gray model for competing risks data with random censoring, where the number of covariates can be larger than the sample size. Despite strong mo tivation from biomedical applications, a high-dimensional Fine-Gray model has attracted relatively little attention among the methodological or theoretical literature. We fill in this gap by developing confidence intervals based on a one-step bias-correction for a regularized estimation. We develop a theoretical framework for the partial likelihood, which does not have independent and identically distributed entries and therefore presents many technical challenges. We also study the approximation error from the weighting scheme under random censoring for competing risks and establish new concentration results for time-dependent processes. In addition to the theoretical results and algorithms, we present extensive numerical experiments and an application to a study of non-cancer mortality among prostate cancer patients using the linked Medicare-SEER data.

المنهجية نظرية الإحصاء تطبيقات الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشرق الأوسط - الأردن

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

High-Dimensional Variable Selection and Prediction under Competing Risks with Application to SEER-Medicare Linked Data

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً