No Arabic abstract
Instrumental variable is an essential tool for addressing unmeasured confounding in observational studies. Two stage predictor substitution (2SPS) estimator and two stage residual inclusion(2SRI) are two commonly used approaches in applying instrumental variables. Recently 2SPS was studied under the additive hazards model in the presence of competing risks of time-to-events data, where linearity was assumed for the relationship between the treatment and the instrument variable. This assumption may not be the most appropriate when we have binary treatments. In this paper, we consider the 2SRI estimator under the additive hazards model for general survival data and in the presence of competing risks, which allows generalized linear models for the relation between the treatment and the instrumental variable. We derive the asymptotic properties including a closed-form asymptotic variance estimate for the 2SRI estimator. We carry out numerical studies in finite samples, and apply our methodology to the linked Surveillance, Epidemiology and End Results (SEER) - Medicare database comparing radical prostatectomy versus conservative treatment in early-stage prostate cancer patients.
Competing risk analysis considers event times due to multiple causes, or of more than one event types. Commonly used regression models for such data include 1) cause-specific hazards model, which focuses on modeling one type of event while acknowledging other event types simultaneously; and 2) subdistribution hazards model, which links the covariate effects directly to the cumulative incidence function. Their use and in particular statistical properties in the presence of high-dimensional predictors are largely unexplored. Motivated by an analysis using the linked SEER-Medicare database for the purposes of predicting cancer versus non-cancer mortality for patients with prostate cancer, we study the accuracy of prediction and variable selection of existing statistical learning methods under both models using extensive simulation experiments, including different approaches to choosing penalty parameters in each method. We then apply the optimal approaches to the analysis of the SEER-Medicare data.
We study explained variation under the additive hazards regression model for right-censored data. We consider different approaches for developing such a measure, and focus on one that estimates the proportion of variation in the failure time explained by the covariates. We study the properties of the measure both analytically, and through extensive simulations. We apply the measure to a well-known survival data set as well as the linked Surveillance, Epidemiology and End Results (SEER)-Medicare database for prediction of mortality in early-stage prostate cancer patients using high dimensional claims codes.
There is a fast-growing literature on estimating optimal treatment regimes based on randomized trials or observational studies under a key identifying condition of no unmeasured confounding. Because confounding by unmeasured factors cannot generally be ruled out with certainty in observational studies or randomized trials subject to noncompliance, we propose a general instrumental variable approach to learning optimal treatment regimes under endogeneity. Specifically, we establish identification of both value function $E[Y_{mathcal{D}(L)}]$ for a given regime $mathcal{D}$ and optimal regimes $text{argmax}_{mathcal{D}} E[Y_{mathcal{D}(L)}]$ with the aid of a binary instrumental variable, when no unmeasured confounding fails to hold. We also construct novel multiply robust classification-based estimators. Furthermore, we propose to identify and estimate optimal treatment regimes among those who would comply to the assigned treatment under a standard monotonicity assumption. In this latter case, we establish the somewhat surprising result that complier optimal regimes can be consistently estimated without directly collecting compliance information and therefore without the complier average treatment effect itself being identified. Our approach is illustrated via extensive simulation studies and a data application on the effect of child rearing on labor participation.
The ill-posedness of the inverse problem of recovering a regression function in a nonparametric instrumental variable model leads to estimators that may suffer from a very slow, logarithmic rate of convergence. In this paper, we show that restricting the problem to models with monotone regression functions and monotone instruments significantly weakens the ill-posedness of the problem. In stark contrast to the existing literature, the presence of a monotone instrument implies boundedness of our measure of ill-posedness when restricted to the space of monotone functions. Based on this result we derive a novel non-asymptotic error bound for the constrained estimator that imposes monotonicity of the regression function. For a given sample size, the bound is independent of the degree of ill-posedness as long as the regression function is not too steep. As an implication, the bound allows us to show that the constrained estimator converges at a fast, polynomial rate, independently of the degree of ill-posedness, in a large, but slowly shrinking neighborhood of constant functions. Our simulation study demonstrates significant finite-sample performance gains from imposing monotonicity even when the regression function is rather far from being a constant. We apply the constrained estimator to the problem of estimating gasoline demand functions from U.S. data.
Instrumental variables (IV) are a useful tool for estimating causal effects in the presence of unmeasured confounding. IV methods are well developed for uncensored outcomes, particularly for structural linear equation models, where simple two-stage estimation schemes are available. The extension of these methods to survival settings is challenging, partly because of the nonlinearity of the popular survival regression models and partly because of the complications associated with right censoring or other survival features. We develop a simple causal hazard ratio estimator in a proportional hazards model with right censored data. The method exploits a special characterization of IV which enables the use of an intuitive inverse weighting scheme that is generally applicable to more complex survival settings with left truncation, competing risks, or recurrent events. We rigorously establish the asymptotic properties of the estimators, and provide plug-in variance estimators. The proposed method can be implemented in standard software, and is evaluated through extensive simulation studies. We apply the proposed IV method to a data set from the Prostate, Lung, Colorectal and Ovarian cancer screening trial to delineate the causal effect of flexible sigmoidoscopy screening on colorectal cancer survival which may be confounded by informative noncompliance with the assigned screening regimen.