No Arabic abstract
Causal effect estimation from observational data is an important but challenging problem. Causal effect estimation with unobserved variables in data is even more difficult. The challenges lie in (1) whether the causal effect can be estimated from observational data (identifiability); (2) accuracy of estimation (unbiasedness), and (3) fast data-driven algorithm for the estimation (efficiency). Each of the above problems by its own, is challenging. There does not exist many data-driven methods for causal effect estimation so far, and they solve one or two of the above problems, but not all. In this paper, we present an algorithm that is fast, unbiased and is able to confirm if a causal effect is identifiable or not under a very practical and commonly seen problem setting. To achieve high efficiency, we approach the causal effect estimation problem as a local search for the minimal adjustment variable sets in data. We have shown that identifiability and unbiased estimation can be both resolved using data in our problem setting, and we have developed theorems to support the local search for searching for adjustment variable sets to achieve unbiased causal effect estimation. We make use of frequent pattern mining strategy to further speed up the search process. Experiments performed on an extensive collection of synthetic and real-world datasets demonstrate that the proposed algorithm outperforms the state-of-the-art causal effect estimation methods in both accuracy and time-efficiency.
When estimating the treatment effect in an observational study, we use a semiparametric locally efficient dimension reduction approach to assess both the treatment assignment mechanism and the average responses in both treated and nontreated groups. We then integrate all results through imputation, inverse probability weighting and doubly robust augmentation estimators. Doubly robust estimators are locally efficient while imputation estimators are super-efficient when the response models are correct. To take advantage of both procedures, we introduce a shrinkage estimator to automatically combine the two, which retains the double robustness property while improving on the variance when the response model is correct. We demonstrate the performance of these estimators through simulated experiments and a real dataset concerning the effect of maternal smoking on baby birth weight. Key words and phrases: Average Treatment Effect, Doubly Robust Estimator, Efficiency, Inverse Probability Weighting, Shrinkage Estimator.
Causal effect estimation from observational data is a crucial but challenging task. Currently, only a limited number of data-driven causal effect estimation methods are available. These methods either provide only a bound estimation of the causal effect of a treatment on the outcome, or generate a unique estimation of the causal effect, but making strong assumptions on data and having low efficiency. In this paper, we identify a practical problem setting and propose an approach to achieving unique and unbiased estimation of causal effects from data with hidden variables. For the approach, we have developed the theorems to support the discovery of the proper covariate sets for confounding adjustment (adjustment sets). Based on the theorems, two algorithms are proposed for finding the proper adjustment sets from data with hidden variables to obtain unbiased and unique causal effect estimation. Experiments with synthetic datasets generated using five benchmark Bayesian networks and four real-world datasets have demonstrated the efficiency and effectiveness of the proposed algorithms, indicating the practicability of the identified problem setting and the potential of the proposed approach in real-world applications.
Missing data and confounding are two problems researchers face in observational studies for comparative effectiveness. Williamson et al. (2012) recently proposed a unified approach to handle both issues concurrently using a multiply-robust (MR) methodology under the assumption that confounders are missing at random. Their approach considers a union of models in which any submodel has a parametric component while the remaining models are unrestricted. We show that while their estimating function is MR in theory, the possibility for multiply robust inference is complicated by the fact that parametric models for different components of the union model are not variation independent and therefore the MR property is unlikely to hold in practice. To address this, we propose an alternative transparent parametrization of the likelihood function, which makes explicit the model dependencies between various nuisance functions needed to evaluate the MR efficient score. The proposed method is genuinely doubly-robust (DR) in that it is consistent and asymptotic normal if one of two sets of modeling assumptions holds. We evaluate the performance and doubly robust property of the DR method via a simulation study.
Standard Mendelian randomization analysis can produce biased results if the genetic variant defining the instrumental variable (IV) is confounded and/or has a horizontal pleiotropic effect on the outcome of interest not mediated by the treatment. We provide novel identification conditions for the causal effect of a treatment in presence of unmeasured confounding, by leveraging an invalid IV for which both the IV independence and exclusion restriction assumptions may be violated. The proposed Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) approach relies on (i) an assumption that the treatment effect does not vary with the invalid IV on the additive scale; and (ii) that the selection bias due to confounding does not vary with the invalid IV on the odds ratio scale; and (iii) that the residual variance for the outcome is heteroscedastic and thus varies with the invalid IV. We formally establish that their conjunction can identify a causal effect even with an invalid IV subject to pleiotropy. MiSTERI is shown to be particularly advantageous in presence of pervasive heterogeneity of pleiotropic effects on additive scale, a setting in which two recently proposed robust estimation methods MR GxE and MR GENIUS can be severely biased. In order to incorporate multiple, possibly correlated and weak IVs, a common challenge in MR studies, we develop a MAny Weak Invalid Instruments (MR MaWII MiSTERI) approach for strengthened identification and improved accuracy MaWII MiSTERI is shown to be robust to horizontal pleiotropy, violation of IV independence assumption and weak IV bias. Both simulation studies and real data analysis results demonstrate the robustness of the proposed MR MiSTERI methods.
Scientists frequently generalize population level causal quantities such as average treatment effect from a source population to a target population. When the causal effects are heterogeneous, differences in subject characteristics between the source and target populations may make such a generalization difficult and unreliable. Reweighting or regression can be used to adjust for such differences when generalizing. However, these methods typically suffer from large variance if there is limited covariate distribution overlap between the two populations. We propose a generalizability score to address this issue. The score can be used as a yardstick to select target subpopulations for generalization. A simplified version of the score avoids using any outcome information and thus can prevent deliberate biases associated with inadvertent access to such information. Both simulation studies and real data analysis demonstrate convincing results for such selection.