No Arabic abstract
Causal effect estimation from observational data is a crucial but challenging task. Currently, only a limited number of data-driven causal effect estimation methods are available. These methods either provide only a bound estimation of the causal effect of a treatment on the outcome, or generate a unique estimation of the causal effect, but making strong assumptions on data and having low efficiency. In this paper, we identify a practical problem setting and propose an approach to achieving unique and unbiased estimation of causal effects from data with hidden variables. For the approach, we have developed the theorems to support the discovery of the proper covariate sets for confounding adjustment (adjustment sets). Based on the theorems, two algorithms are proposed for finding the proper adjustment sets from data with hidden variables to obtain unbiased and unique causal effect estimation. Experiments with synthetic datasets generated using five benchmark Bayesian networks and four real-world datasets have demonstrated the efficiency and effectiveness of the proposed algorithms, indicating the practicability of the identified problem setting and the potential of the proposed approach in real-world applications.
This paper discusses the problem of causal query in observational data with hidden variables, with the aim of seeking the change of an outcome when manipulating a variable while given a set of plausible confounding variables which affect the manipulated variable and the outcome. Such an experiment on data to estimate the causal effect of the manipulated variable is useful for validating an experiment design using historical data or for exploring confounders when studying a new relationship. However, existing data-driven methods for causal effect estimation face some major challenges, including poor scalability with high dimensional data, low estimation accuracy due to heuristics used by the global causal structure learning algorithms, and the assumption of causal sufficiency when hidden variables are inevitable in data. In this paper, we develop a theorem for using local search to find a superset of the adjustment (or confounding) variables for causal effect estimation from observational data under a realistic pretreatment assumption. The theorem ensures that the unbiased estimate of causal effect is included in the set of causal effects estimated by the superset of adjustment variables. Based on the developed theorem, we propose a data-driven algorithm for causal query. Experiments show that the proposed algorithm is faster and produces better causal effect estimation than an existing data-driven causal effect estimation method with hidden variables. The causal effects estimated by the proposed algorithm are as accurate as those by the state-of-the-art methods using domain knowledge.
Many real-world decision-making tasks require learning casual relationships between a set of variables. Typical causal discovery methods, however, require that all variables are observed, which might not be realistic in practice. Unfortunately, in the presence of latent confounding, recovering casual relationships from observational data without making additional assumptions is an ill-posed problem. Fortunately, in practice, additional structure among the confounders can be expected, one such example being pervasive confounding, which has been exploited for consistent causal estimation in the special case of linear causal models. In this paper, we provide a proof and method to estimate causal relationships in the non-linear, pervasive confounding setting. The heart of our procedure relies on the ability to estimate the pervasive confounding variation through a simple spectral decomposition of the observed data matrix. We derive a DAG score function based on this insight, and empirically compare our method to existing procedures. We show improved performance on both simulated and real datasets by explicitly accounting for both confounders and non-linear effects.
Causal effect estimation from observational data is an important but challenging problem. Causal effect estimation with unobserved variables in data is even more difficult. The challenges lie in (1) whether the causal effect can be estimated from observational data (identifiability); (2) accuracy of estimation (unbiasedness), and (3) fast data-driven algorithm for the estimation (efficiency). Each of the above problems by its own, is challenging. There does not exist many data-driven methods for causal effect estimation so far, and they solve one or two of the above problems, but not all. In this paper, we present an algorithm that is fast, unbiased and is able to confirm if a causal effect is identifiable or not under a very practical and commonly seen problem setting. To achieve high efficiency, we approach the causal effect estimation problem as a local search for the minimal adjustment variable sets in data. We have shown that identifiability and unbiased estimation can be both resolved using data in our problem setting, and we have developed theorems to support the local search for searching for adjustment variable sets to achieve unbiased causal effect estimation. We make use of frequent pattern mining strategy to further speed up the search process. Experiments performed on an extensive collection of synthetic and real-world datasets demonstrate that the proposed algorithm outperforms the state-of-the-art causal effect estimation methods in both accuracy and time-efficiency.
When estimating the treatment effect in an observational study, we use a semiparametric locally efficient dimension reduction approach to assess both the treatment assignment mechanism and the average responses in both treated and nontreated groups. We then integrate all results through imputation, inverse probability weighting and doubly robust augmentation estimators. Doubly robust estimators are locally efficient while imputation estimators are super-efficient when the response models are correct. To take advantage of both procedures, we introduce a shrinkage estimator to automatically combine the two, which retains the double robustness property while improving on the variance when the response model is correct. We demonstrate the performance of these estimators through simulated experiments and a real dataset concerning the effect of maternal smoking on baby birth weight. Key words and phrases: Average Treatment Effect, Doubly Robust Estimator, Efficiency, Inverse Probability Weighting, Shrinkage Estimator.
An important problem in causal inference is to break down the total effect of treatment into different causal pathways and quantify the causal effect in each pathway. Causal mediation analysis (CMA) is a formal statistical approach for identifying and estimating these causal effects. Central to CMA is the sequential ignorability assumption that implies all pre-treatment confounders are measured and they can capture different types of confounding, e.g., post-treatment confounders and hidden confounders. Typically unverifiable in observational studies, this assumption restrains both the coverage and practicality of conventional methods. This work, therefore, aims to circumvent the stringent assumption by following a causal graph with a unified confounder and its proxy variables. Our core contribution is an algorithm that combines deep latent-variable models and proxy strategy to jointly infer a unified surrogate confounder and estimate different causal effects in CMA from observed variables. Empirical evaluations using both synthetic and semi-synthetic datasets validate the effectiveness of the proposed method.