ترغب بنشر مسار تعليمي؟ اضغط هنا

Fine-Gray competing risks model with high-dimensional covariates: estimation and Inference

111   0   0.0 ( 0 )
 نشر من قبل Jelena Bradic
 تاريخ النشر 2017
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

The purpose of this paper is to construct confidence intervals for the regression coefficients in the Fine-Gray model for competing risks data with random censoring, where the number of covariates can be larger than the sample size. Despite strong motivation from biomedical applications, a high-dimensional Fine-Gray model has attracted relatively little attention among the methodological or theoretical literature. We fill in this gap by developing confidence intervals based on a one-step bias-correction for a regularized estimation. We develop a theoretical framework for the partial likelihood, which does not have independent and identically distributed entries and therefore presents many technical challenges. We also study the approximation error from the weighting scheme under random censoring for competing risks and establish new concentration results for time-dependent processes. In addition to the theoretical results and algorithms, we present extensive numerical experiments and an application to a study of non-cancer mortality among prostate cancer patients using the linked Medicare-SEER data.



قيم البحث

اقرأ أيضاً

Causal inference has been increasingly reliant on observational studies with rich covariate information. To build tractable causal models, including the propensity score models, it is imperative to first extract important features from high dimension al data. Unlike the familiar task of variable selection for prediction modeling, our feature selection procedure aims to control for confounding while maintaining efficiency in the resulting causal effect estimate. Previous empirical studies imply that one should aim to include all predictors of the outcome, rather than the treatment, in the propensity score model. In this paper, we formalize this intuition through rigorous proofs, and propose the causal ball screening for selecting these variables from modern ultra-high dimensional data sets. A distinctive feature of our proposal is that we do not require any modeling on the outcome regression, thus providing robustness against misspecification of the functional form or violation of smoothness conditions. Our theoretical analyses show that the proposed procedure enjoys a number of oracle properties including model selection consistency, normality and efficiency. Synthetic and real data analyses show that our proposal performs favorably with existing methods in a range of realistic settings.
406 - Jingfei Zhang , Yi Li 2020
Though Gaussian graphical models have been widely used in many scientific fields, limited progress has been made to link graph structures to external covariates because of substantial challenges in theory and computation. We propose a Gaussian graphi cal regression model, which regresses both the mean and the precision matrix of a Gaussian graphical model on covariates. In the context of co-expression quantitative trait locus (QTL) studies, our framework facilitates estimation of both population- and subject-level gene regulatory networks, and detection of how subject-level networks vary with genetic variants and clinical conditions. Our framework accommodates high dimensional responses and covariates, and encourages covariate effects on both the mean and the precision matrix to be sparse. In particular for the precision matrix, we stipulate simultaneous sparsity, i.e., group sparsity and element-wise sparsity, on effective covariates and their effects on network edges, respectively. We establish variable selection consistency first under the case with known mean parameters and then a more challenging case with unknown means depending on external covariates, and show in both cases that the convergence rate of the estimated precision parameters is faster than that obtained by lasso or group lasso, a desirable property for the sparse group lasso estimation. The utility and efficacy of our proposed method is demonstrated through simulation studies and an application to a co-expression QTL study with brain cancer patients.
There are many scenarios such as the electronic health records where the outcome is much more difficult to collect than the covariates. In this paper, we consider the linear regression problem with such a data structure under the high dimensionality. Our goal is to investigate when and how the unlabeled data can be exploited to improve the estimation and inference of the regression parameters in linear models, especially in light of the fact that such linear models may be misspecified in data analysis. In particular, we address the following two important questions. (1) Can we use the labeled data as well as the unlabeled data to construct a semi-supervised estimator such that its convergence rate is faster than the supervised estimators? (2) Can we construct confidence intervals or hypothesis tests that are guaranteed to be more efficient or powerful than the supervised estimators? To address the first question, we establish the minimax lower bound for parameter estimation in the semi-supervised setting. We show that the upper bound from the supervised estimators that only use the labeled data cannot attain this lower bound. We close this gap by proposing a new semi-supervised estimator which attains the lower bound. To address the second question, based on our proposed semi-supervised estimator, we propose two additional estimators for semi-supervised inference, the efficient estimator and the safe estimator. The former is fully efficient if the unknown conditional mean function is estimated consistently, but may not be more efficient than the supervised approach otherwise. The latter usually does not aim to provide fully efficient inference, but is guaranteed to be no worse than the supervised approach, no matter whether the linear model is correctly specified or the conditional mean function is consistently estimated.
Estimating causal effects for survival outcomes in the high-dimensional setting is an extremely important topic for many biomedical applications as well as areas of social sciences. We propose a new orthogonal score method for treatment effect estima tion and inference that results in asymptotically valid confidence intervals assuming only good estimation properties of the hazard outcome model and the conditional probability of treatment. This guarantee allows us to provide valid inference for the conditional treatment effect under the high-dimensional additive hazards model under considerably more generality than existing approaches. In addition, we develop a new Hazards Difference (HDi), estimator. We showcase that our approach has double-robustness properties in high dimensions: with cross-fitting, the HDi estimate is consistent under a wide variety of treatment assignment models; the HDi estimate is also consistent when the hazards model is misspecified and instead the true data generating mechanism follows a partially linear additive hazards model. We further develop a novel sparsity doubly robust result, where either the outcome or the treatment model can be a fully dense high-dimensional model. We apply our methods to study the treatment effect of radical prostatectomy versus conservative management for prostate cancer patients using the SEER-Medicare Linked Data.
81 - Baoluo Sun , Zhiqiang Tan 2020
Consider the problem of estimating the local average treatment effect with an instrument variable, where the instrument unconfoundedness holds after adjusting for a set of measured covariates. Several unknown functions of the covariates need to be es timated through regression models, such as instrument propensity score and treatment and outcome regression models. We develop a computationally tractable method in high-dimensional settings where the numbers of regression terms are close to or larger than the sample size. Our method exploits regularized calibrated estimation, which involves Lasso penalties but carefully chosen loss functions for estimating coefficient vectors in these regression models, and then employs a doubly robust estimator for the treatment parameter through augmented inverse probability weighting. We provide rigorous theoretical analysis to show that the resulting Wald confidence intervals are valid for the treatment parameter under suitable sparsity conditions if the instrument propensity score model is correctly specified, but the treatment and outcome regression models may be misspecified. For existing high-dimensional methods, valid confidence intervals are obtained for the treatment parameter if all three models are correctly specified. We evaluate the proposed methods via extensive simulation studies and an empirical application to estimate the returns to education.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا