No Arabic abstract
The notion of exchangeability has been recognized in the causal inference literature in various guises, but only rarely in the original Bayesian meaning as a symmetry property between individual units in statistical inference. Since the latter is a standard ingredient in Bayesian inference, we argue that in Bayesian causal inference it is natural to link the causal model, including the notion of confounding and definition of causal contrasts of interest, to the concept of exchangeability. Here we relate the Bayesian notion of exchangeability to alternative conditions for unconfounded inferences, commonly stated using potential outcomes, and define causal contrasts in the presence of exchangeability in terms of limits of posterior predictive expectations for further exchangeable units. While our main focus is in a point treatment setting, we also investigate how this reasoning carries over to longitudinal settings.
Skepticism about the assumption of no unmeasured confounding, also known as exchangeability, is often warranted in making causal inferences from observational data; because exchangeability hinges on an investigators ability to accurately measure covariates that capture all potential sources of confounding. In practice, the most one can hope for is that covariate measurements are at best proxies of the true underlying confounding mechanism operating in a given observational study. In this paper, we consider the framework of proximal causal inference introduced by Tchetgen Tchetgen et al. (2020), which while explicitly acknowledging covariate measurements as imperfect proxies of confounding mechanisms, offers an opportunity to learn about causal effects in settings where exchangeability on the basis of measured covariates fails. We make a number of contributions to proximal inference including (i) an alternative set of conditions for nonparametric proximal identification of the average treatment effect; (ii) general semiparametric theory for proximal estimation of the average treatment effect including efficiency bounds for key semiparametric models of interest; (iii) a characterization of proximal doubly robust and locally efficient estimators of the average treatment effect. Moreover, we provide analogous identification and efficiency results for the average treatment effect on the treated. Our approach is illustrated via simulation studies and a data application on evaluating the effectiveness of right heart catheterization in the intensive care unit of critically ill patients.
Propensity score methods have been shown to be powerful in obtaining efficient estimators of average treatment effect (ATE) from observational data, especially under the existence of confounding factors. When estimating, deciding which type of covariates need to be included in the propensity score function is important, since incorporating some unnecessary covariates may amplify both bias and variance of estimators of ATE. In this paper, we show that including additional instrumental variables that satisfy the exclusion restriction for outcome will do harm to the statistical efficiency. Also, we prove that, controlling for covariates that appear as outcome predictors, i.e. predict the outcomes and are irrelevant to the exposures, can help reduce the asymptotic variance of ATE estimation. We also note that, efficiently estimating the ATE by non-parametric or semi-parametric methods require the estimated propensity score function, as described in Hirano et al. (2003)cite{Hirano2003}. Such estimation procedure usually asks for many regularity conditions, Rothe (2016)cite{Rothe2016} also illustrated this point and proposed a known propensity score (KPS) estimator that requires mild regularity conditions and is still fully efficient. In addition, we introduce a linearly modified (LM) estimator that is nearly efficient in most general settings and need not estimation of the propensity score function, hence convenient to calculate. The construction of this estimator borrows idea from the interaction estimator of Lin (2013)cite{Lin2013}, in which regression adjustment with interaction terms are applied to deal with data arising from a completely randomized experiment. As its name suggests, the LM estimator can be viewed as a linear modification on the IPW estimator using known propensity scores. We will also investigate its statistical properties both analytically and numerically.
A standard assumption for causal inference about the joint effects of time-varying treatment is that one has measured sufficient covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values, also known as sequential randomization assumption (SRA). SRA is often criticized as it requires one to accurately measure all confounders. Realistically, measured covariates can rarely capture all confounders with certainty. Often covariate measurements are at best proxies of confounders, thus invalidating inferences under SRA. In this paper, we extend the proximal causal inference (PCI) framework of Miao et al. (2018) to the longitudinal setting under a semiparametric marginal structural mean model (MSMM). PCI offers an opportunity to learn about joint causal effects in settings where SRA based on measured time-varying covariates fails, by formally accounting for the covariate measurements as imperfect proxies of underlying confounding mechanisms. We establish nonparametric identification with a pair of time-varying proxies and provide a corresponding characterization of regular and asymptotically linear estimators of the parameter indexing the MSMM, including a rich class of doubly robust estimators, and establish the corresponding semiparametric efficiency bound for the MSMM. Extensive simulation studies and a data application illustrate the finite sample behavior of proposed methods.
We propose novel estimators for categorical and continuous treatments by using an optimal covariate balancing strategy for inverse probability weighting. The resulting estimators are shown to be consistent and asymptotically normal for causal contrasts of interest, either when the model explaining treatment assignment is correctly specified, or when the correct set of bases for the outcome models has been chosen and the assignment model is sufficiently rich. For the categorical treatment case, we show that the estimator attains the semiparametric efficiency bound when all models are correctly specified. For the continuous case, the causal parameter of interest is a function of the treatment dose. The latter is not parametrized and the estimators proposed are shown to have bias and variance of the classical nonparametric rate. Asymptotic results are complemented with simulations illustrating the finite sample properties. Our analysis of a data set suggests a nonlinear effect of BMI on the decline in self reported health.
Instrumental variable methods are among the most commonly used causal inference approaches to account for unmeasured confounders in observational studies. The presence of invalid instruments is a major concern for practical applications and a fast-growing area of research is inference for the causal effect with possibly invalid instruments. The existing inference methods rely on correctly separating valid and invalid instruments in a data dependent way. In this paper, we illustrate post-selection problems of these existing methods. We construct uniformly valid confidence intervals for the causal effect, which are robust to the mistakes in separating valid and invalid instruments. Our proposal is to search for the causal effect such that a sufficient amount of candidate instruments can be taken as valid. We further devise a novel sampling method, which, together with searching, lead to a more precise confidence interval. Our proposed searching and sampling confidence intervals are shown to be uniformly valid under the finite-sample majority and plurality rules. We compare our proposed methods with existing inference methods over a large set of simulation studies and apply them to study the effect of the triglyceride level on the glucose level over a mouse data set.