No Arabic abstract
The primary analysis of randomized screening trials for cancer typically adheres to the intention-to-screen principle, measuring cancer-specific mortality reductions between screening and control arms. These mortality reductions result from a combination of the screening regimen, screening technology and the effect of the early, screening-induced, treatment. This motivates addressing these different aspects separately. Here we are interested in the causal effect of early versus delayed treatments on cancer mortality among the screening-detectable subgroup, which under certain assumptions is estimable from conventional randomized screening trial using instrumental variable type methods. To define the causal effect of interest, we formulate a simplified structural multi-state model for screening trials, based on a hypothetical intervention trial where screening detected individuals would be randomized into early versus delayed treatments. The cancer-specific mortality reductions after screening detection are quantified by a cause-specific hazard ratio. For this, we propose two estimators, based on an estimating equation and a likelihood expression. The methods extend existing instrumental variable methods for time-to-event and competing risks outcomes to time-dependent intermediate variables. Using the multi-state model as the basis of a data generating mechanism, we investigate the performance of the new estimators through simulation studies. In addition, we illustrate the proposed method in the context of CT screening for lung cancer using the US National Lung Screening Trial (NLST) data.
Instrumental variables are widely used to deal with unmeasured confounding in observational studies and imperfect randomized controlled trials. In these studies, researchers often target the so-called local average treatment effect as it is identifiable under mild conditions. In this paper, we consider estimation of the local average treatment effect under the binary instrumental variable model. We discuss the challenges for causal estimation with a binary outcome, and show that surprisingly, it can be more difficult than the case with a continuous outcome. We propose novel modeling and estimating procedures that improve upon existing proposals in terms of model congeniality, interpretability, robustness or efficiency. Our approach is illustrated via simulation studies and a real data analysis.
Robins 1997 introduced marginal structural models (MSMs), a general class of counterfactual models for the joint effects of time-varying treatment regimes in complex longitudinal studies subject to time-varying confounding. In his work, identification of MSM parameters is established under a sequential randomization assumption (SRA), which rules out unmeasured confounding of treatment assignment over time. We consider sufficient conditions for identification of the parameters of a subclass, Marginal Structural Mean Models (MSMMs), when sequential randomization fails to hold due to unmeasured confounding, using instead a time-varying instrumental variable. Our identification conditions require that no unobserved confounder predicts compliance type for the time-varying treatment. We describe a simple weighted estimator and examine its finite-sample properties in a simulation study. We apply the proposed estimator to examine the effect of delivery hospital on neonatal survival probability.
We consider the estimation of the average treatment effect in the treated as a function of baseline covariates, where there is a valid (conditional) instrument. We describe two doubly robust (DR) estimators: a locally efficient g-estimator, and a targeted minimum loss-based estimator (TMLE). These two DR estimators can be viewed as generalisations of the two-stage least squares (TSLS) method to semi-parametric models that make weaker assumptions. We exploit recent theoretical results that extend to the g-estimator the use of data-adaptive fits for the nuisance parameters. A simulation study is used to compare standard TSLS with the two DR estimators finite-sample performance, (1) when fitted using parametric nuisance models, and (2) using data-adaptive nuisance fits, obtained from the Super Learner, an ensemble machine learning method. Data-adaptive DR estimators have lower bias and improved coverage, when compared to incorrectly specified parametric DR estimators and TSLS. When the parametric model for the treatment effect curve is correctly specified, the g-estimator outperforms all others, but when this model is misspecified, TMLE performs best, while TSLS can result in large biases and zero coverage. Finally, we illustrate the methods by reanalysing the COPERS (COping with persistent Pain, Effectiveness Research in Self-management) trial to make inference about the causal effect of treatment actually received, and the extent to which this is modified by depression at baseline.
Instrumental variable methods have been widely used to identify causal effects in the presence of unmeasured confounding. A key identification condition known as the exclusion restriction states that the instrument cannot have a direct effect on the outcome which is not mediated by the exposure in view. In the health and social sciences, such an assumption is often not credible. To address this concern, we consider identification conditions of the population average treatment effect with an invalid instrumental variable which does not satisfy the exclusion restriction, and derive the efficient influence function targeting the identifying functional under a nonparametric observed data model. We propose a novel multiply robust locally efficient estimator of the average treatment effect that is consistent in the union of multiple parametric nuisance models, as well as a multiply debiased machine learning estimator for which the nuisance parameters are estimated using generic machine learning methods, that effectively exploit various forms of linear or nonlinear structured sparsity in the nuisance parameter space. When one cannot be confident that any of these machine learners is consistent at sufficiently fast rates to ensure $surd{n}$-consistency for the average treatment effect, we introduce a new criteria for selective machine learning which leverages the multiple robustness property in order to ensure small bias. The proposed methods are illustrated through extensive simulations and a data analysis evaluating the causal effect of 401(k) participation on savings.
We develop new semiparametric methods for estimating treatment effects. We focus on a setting where the outcome distributions may be thick tailed, where treatment effects are small, where sample sizes are large and where assignment is completely random. This setting is of particular interest in recent experimentation in tech companies. We propose using parametric models for the treatment effects, as opposed to parametric models for the full outcome distributions. This leads to semiparametric models for the outcome distributions. We derive the semiparametric efficiency bound for this setting, and propose efficient estimators. In the case with a constant treatment effect one of the proposed estimators has an interesting interpretation as a weighted average of quantile treatment effects, with the weights proportional to (minus) the second derivative of the log of the density of the potential outcomes. Our analysis also results in an extension of Hubers model and trimmed mean to include asymmetry and a simplified condition on linear combinations of order statistics, which may be of independent interest.