No Arabic abstract
One of the most common mistakes made when performing data analysis is attributing causal meaning to regression coefficients. Formally, a causal effect can only be computed if it is identifiable from a combination of observational data and structural knowledge about the domain under investigation (Pearl, 2000, Ch. 5). Building on the literature of instrumental variables (IVs), a plethora of methods has been developed to identify causal effects in linear systems. Almost invariably, however, the most powerful such methods rely on exponential-time procedures. In this paper, we investigate graphical conditions to allow efficient identification in arbitrary linear structural causal models (SCMs). In particular, we develop a method to efficiently find unconditioned instrumental subsets, which are generalizations of IVs that can be used to tame the complexity of many canonical algorithms found in the literature. Further, we prove that determining whether an effect can be identified with TSID (Weihs et al., 2017), a method more powerful than unconditioned instrumental sets and other efficient identification algorithms, is NP-Complete. Finally, building on the idea of flow constraints, we introduce a new and efficient criterion called Instrumental Cutsets (IC), which is able to solve for parameters missed by all other existing polynomial-time algorithms.
In this work, we consider the problem of robust parameter estimation from observational data in the context of linear structural equation models (LSEMs). LSEMs are a popular and well-studied class of models for inferring causality in the natural and social sciences. One of the main problems related to LSEMs is to recover the model parameters from the observational data. Under various conditions on LSEMs and the model parameters the prior work provides efficient algorithms to recover the parameters. However, these results are often about generic identifiability. In practice, generic identifiability is not sufficient and we need robust identifiability: small changes in the observational data should not affect the parameters by a huge amount. Robust identifiability has received far less attention and remains poorly understood. Sankararaman et al. (2019) recently provided a set of sufficient conditions on parameters under which robust identifiability is feasible. However, a limitation of their work is that their results only apply to a small sub-class of LSEMs, called ``bow-free paths. In this work, we significantly extend their work along multiple dimensions. First, for a large and well-studied class of LSEMs, namely ``bow free models, we provide a sufficient condition on model parameters under which robust identifiability holds, thereby removing the restriction of paths required by prior work. We then show that this sufficient condition holds with high probability which implies that for a large set of parameters robust identifiability holds and that for such parameters, existing algorithms already achieve robust identifiability. Finally, we validate our results on both simulated and real-world datasets.
Assessing the magnitude of cause-and-effect relations is one of the central challenges found throughout the empirical sciences. The problem of identification of causal effects is concerned with determining whether a causal effect can be computed from a combination of observational data and substantive knowledge about the domain under investigation, which is formally expressed in the form of a causal graph. In many practical settings, however, the knowledge available for the researcher is not strong enough so as to specify a unique causal graph. Another line of investigation attempts to use observational data to learn a qualitative description of the domain called a Markov equivalence class, which is the collection of causal graphs that share the same set of observed features. In this paper, we marry both approaches and study the problem of causal identification from an equivalence class, represented by a partial ancestral graph (PAG). We start by deriving a set of graphical properties of PAGs that are carried over to its induced subgraphs. We then develop an algorithm to compute the effect of an arbitrary set of variables on an arbitrary outcome set. We show that the algorithm is strictly more powerful than the current state of the art found in the literature.
We address the problem of estimating the effect of intervening on a set of variables X from experiments on a different set, Z, that is more accessible to manipulation. This problem, which we call z-identifiability, reduces to ordinary identifiability when Z = empty and, like the latter, can be given syntactic characterization using the do-calculus [Pearl, 1995; 2000]. We provide a graphical necessary and sufficient condition for z-identifiability for arbitrary sets X,Z, and Y (the outcomes). We further develop a complete algorithm for computing the causal effect of X on Y using information provided by experiments on Z. Finally, we use our results to prove completeness of do-calculus relative to z-identifiability, a result that does not follow from completeness relative to ordinary identifiability.
Unmeasured confounding is a threat to causal inference and individualized decision making. Similar to Cui and Tchetgen Tchetgen (2020); Qiu et al. (2020); Han (2020a), we consider the problem of identification of optimal individualized treatment regimes with a valid instrumental variable. Han (2020a) provided an alternative identifying condition of optimal treatment regimes using the conditional Wald estimand of Cui and Tchetgen Tchetgen (2020); Qiu et al. (2020) when treatment assignment is subject to endogeneity and a valid binary instrumental variable is available. In this note, we provide a necessary and sufficient condition for identification of optimal treatment regimes using the conditional Wald estimand. Our novel condition is necessarily implied by those of Cui and Tchetgen Tchetgen (2020); Qiu et al. (2020); Han (2020a) and may continue to hold in a variety of potential settings not covered by prior results.
Instrumental variable methods are widely used for inferring the causal effect of an exposure on an outcome when the observed relationship is potentially affected by unmeasured confounders. Existing instrumental variable methods for nonlinear outcome models require stringent identifiability conditions. We develop a robust causal inference framework for nonlinear outcome models, which relaxes the conventional identifiability conditions. We adopt a flexible semi-parametric potential outcome model and propose new identifiability conditions for identifying the model parameters and causal effects. We devise a novel three-step inference procedure for the conditional average treatment effect and establish the asymptotic normality of the proposed point estimator. We construct confidence intervals for the causal effect by the bootstrap method. The proposed method is demonstrated in a large set of simulation studies and is applied to study the causal effects of lipid levels on whether the glucose level is normal or high over a mice dataset.