No Arabic abstract
In a comprehensive cohort study of two competing treatments (say, A and B), clinically eligible individuals are first asked to enroll in a randomized trial and, if they refuse, are then asked to enroll in a parallel observational study in which they can choose treatment according to their own preference. We consider estimation of two estimands: (1) comprehensive cohort causal effect -- the difference in mean potential outcomes had all patients in the comprehensive cohort received treatment A vs. treatment B and (2) randomized trial causal effect -- the difference in mean potential outcomes had all patients enrolled in the randomized trial received treatment A vs. treatment B. For each estimand, we consider inference under various sets of unconfoundedness assumptions and construct semiparametric efficient and robust estimators. These estimators depend on nuisance functions, which we estimate, for illustrative purposes, using generalized additive models. Using the theory of sample splitting, we establish the asymptotic properties of our proposed estimators. We also illustrate our methodology using data from the Bypass Angioplasty Revascularization Investigation (BARI) randomized trial and observational registry to evaluate the effect of percutaneous transluminal coronary balloon angioplasty versus coronary artery bypass grafting on 5-year mortality. To evaluate the finite sample performance of our estimators, we use the BARI dataset as the basis of a realistic simulation study.
A standard assumption for causal inference about the joint effects of time-varying treatment is that one has measured sufficient covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values, also known as sequential randomization assumption (SRA). SRA is often criticized as it requires one to accurately measure all confounders. Realistically, measured covariates can rarely capture all confounders with certainty. Often covariate measurements are at best proxies of confounders, thus invalidating inferences under SRA. In this paper, we extend the proximal causal inference (PCI) framework of Miao et al. (2018) to the longitudinal setting under a semiparametric marginal structural mean model (MSMM). PCI offers an opportunity to learn about joint causal effects in settings where SRA based on measured time-varying covariates fails, by formally accounting for the covariate measurements as imperfect proxies of underlying confounding mechanisms. We establish nonparametric identification with a pair of time-varying proxies and provide a corresponding characterization of regular and asymptotically linear estimators of the parameter indexing the MSMM, including a rich class of doubly robust estimators, and establish the corresponding semiparametric efficiency bound for the MSMM. Extensive simulation studies and a data application illustrate the finite sample behavior of proposed methods.
With increasing data availability, causal treatment effects can be evaluated across different datasets, both randomized controlled trials (RCTs) and observational studies. RCTs isolate the effect of the treatment from that of unwanted (confounding) co-occurring effects. But they may struggle with inclusion biases, and thus lack external validity. On the other hand, large observational samples are often more representative of the target population but can conflate confounding effects with the treatment of interest. In this paper, we review the growing literature on methods for causal inference on combined RCTs and observational studies, striving for the best of both worlds. We first discuss identification and estimation methods that improve generalizability of RCTs using the representativeness of observational data. Classical estimators include weighting, difference between conditional outcome models, and doubly robust estimators. We then discuss methods that combine RCTs and observational data to improve (conditional) average treatment effect estimation, handling possible unmeasured confounding in the observational data. We also connect and contrast works developed in both the potential outcomes framework and the structural causal model framework. Finally, we compare the main methods using a simulation study and real world data to analyze the effect of tranexamic acid on the mortality rate in major trauma patients. Code to implement many of the methods is provided.
Propensity score methods have been shown to be powerful in obtaining efficient estimators of average treatment effect (ATE) from observational data, especially under the existence of confounding factors. When estimating, deciding which type of covariates need to be included in the propensity score function is important, since incorporating some unnecessary covariates may amplify both bias and variance of estimators of ATE. In this paper, we show that including additional instrumental variables that satisfy the exclusion restriction for outcome will do harm to the statistical efficiency. Also, we prove that, controlling for covariates that appear as outcome predictors, i.e. predict the outcomes and are irrelevant to the exposures, can help reduce the asymptotic variance of ATE estimation. We also note that, efficiently estimating the ATE by non-parametric or semi-parametric methods require the estimated propensity score function, as described in Hirano et al. (2003)cite{Hirano2003}. Such estimation procedure usually asks for many regularity conditions, Rothe (2016)cite{Rothe2016} also illustrated this point and proposed a known propensity score (KPS) estimator that requires mild regularity conditions and is still fully efficient. In addition, we introduce a linearly modified (LM) estimator that is nearly efficient in most general settings and need not estimation of the propensity score function, hence convenient to calculate. The construction of this estimator borrows idea from the interaction estimator of Lin (2013)cite{Lin2013}, in which regression adjustment with interaction terms are applied to deal with data arising from a completely randomized experiment. As its name suggests, the LM estimator can be viewed as a linear modification on the IPW estimator using known propensity scores. We will also investigate its statistical properties both analytically and numerically.
The goal of causal inference is to understand the outcome of alternative courses of action. However, all causal inference requires assumptions. Such assumptions can be more influential than in typical tasks for probabilistic modeling, and testing those assumptions is important to assess the validity of causal inference. We develop model criticism for Bayesian causal inference, building on the idea of posterior predictive checks to assess model fit. Our approach involves decomposing the problem, separately criticizing the model of treatment assignments and the model of outcomes. Conditioned on the assumption of unconfoundedness---that the treatments are assigned independently of the potential outcomes---we show how to check any additional modeling assumption. Our approach provides a foundation for diagnosing model-based causal inferences.
Weighting methods are a common tool to de-bias estimates of causal effects. And though there are an increasing number of seemingly disparate methods, many of them can be folded into one unifying regime: causal optimal transport. This new method directly targets distributional balance by minimizing optimal transport distances between treatment and control groups or, more generally, between a source and target population. Our approach is model-free but can also incorporate moments or any other important functions of covariates that the researcher desires to balance. We find that the causal optimal transport outperforms competitor methods when both the propensity score and outcome models are misspecified, indicating it is a robust alternative to common weighting methods. Finally, we demonstrate the utility of our method in an external control study examining the effect of misoprostol versus oxytocin for treatment of post-partum hemorrhage.