ترغب بنشر مسار تعليمي؟ اضغط هنا

Causal query in observational data with hidden variables

82   0   0.0 ( 0 )
 نشر من قبل Debo Cheng
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Debo Cheng




اسأل ChatGPT حول البحث

This paper discusses the problem of causal query in observational data with hidden variables, with the aim of seeking the change of an outcome when manipulating a variable while given a set of plausible confounding variables which affect the manipulated variable and the outcome. Such an experiment on data to estimate the causal effect of the manipulated variable is useful for validating an experiment design using historical data or for exploring confounders when studying a new relationship. However, existing data-driven methods for causal effect estimation face some major challenges, including poor scalability with high dimensional data, low estimation accuracy due to heuristics used by the global causal structure learning algorithms, and the assumption of causal sufficiency when hidden variables are inevitable in data. In this paper, we develop a theorem for using local search to find a superset of the adjustment (or confounding) variables for causal effect estimation from observational data under a realistic pretreatment assumption. The theorem ensures that the unbiased estimate of causal effect is included in the set of causal effects estimated by the superset of adjustment variables. Based on the developed theorem, we propose a data-driven algorithm for causal query. Experiments show that the proposed algorithm is faster and produces better causal effect estimation than an existing data-driven causal effect estimation method with hidden variables. The causal effects estimated by the proposed algorithm are as accurate as those by the state-of-the-art methods using domain knowledge.



قيم البحث

اقرأ أيضاً

The last decade witnessed the development of algorithms that completely solve the identifiability problem for causal effects in hidden variable causal models associated with directed acyclic graphs. However, much of this machinery remains underutiliz ed in practice owing to the complexity of estimating identifying functionals yielded by these algorithms. In this paper, we provide simple graphical criteria and semiparametric estimators that bridge the gap between identification and estimation for causal effects involving a single treatment and a single outcome. First, we provide influence function based doubly robust estimators that cover a significant subset of hidden variable causal models where the effect is identifiable. We further characterize an important subset of this class for which we demonstrate how to derive the estimator with the lowest asymptotic variance, i.e., one that achieves the semiparametric efficiency bound. Finally, we provide semiparametric estimators for any single treatment causal effect parameter identified via the aforementioned algorithms. The resulting estimators resemble influence function based estimators that are sequentially reweighted, and exhibit a partial double robustness property, provided the parts of the likelihood corresponding to a set of weight models are correctly specified. Our methods are easy to implement and we demonstrate their utility through simulations.
95 - Debo Cheng 2020
Causal effect estimation from observational data is a crucial but challenging task. Currently, only a limited number of data-driven causal effect estimation methods are available. These methods either provide only a bound estimation of the causal eff ect of a treatment on the outcome, or generate a unique estimation of the causal effect, but making strong assumptions on data and having low efficiency. In this paper, we identify a practical problem setting and propose an approach to achieving unique and unbiased estimation of causal effects from data with hidden variables. For the approach, we have developed the theorems to support the discovery of the proper covariate sets for confounding adjustment (adjustment sets). Based on the theorems, two algorithms are proposed for finding the proper adjustment sets from data with hidden variables to obtain unbiased and unique causal effect estimation. Experiments with synthetic datasets generated using five benchmark Bayesian networks and four real-world datasets have demonstrated the efficiency and effectiveness of the proposed algorithms, indicating the practicability of the identified problem setting and the potential of the proposed approach in real-world applications.
Convenient access to observational data enables us to learn causal effects without randomized experiments. This research direction draws increasing attention in research areas such as economics, healthcare, and education. For example, we can study ho w a medicine (the treatment) causally affects the health condition (the outcome) of a patient using existing electronic health records. To validate causal effects learned from observational data, we have to control confounding bias -- the influence of variables which causally influence both the treatment and the outcome. Existing work along this line overwhelmingly relies on the unconfoundedness assumption that there do not exist unobserved confounders. However, this assumption is untestable and can even be untenable. In fact, an important fact ignored by the majority of previous work is that observational data can come with network information that can be utilized to infer hidden confounders. For example, in an observational study of the individual-level treatment effect of a medicine, instead of randomized experiments, the medicine is often assigned to each individual based on a series of factors. Some of the factors (e.g., socioeconomic status) can be challenging to measure and therefore become hidden confounders. Fortunately, the socioeconomic status of an individual can be reflected by whom she is connected in social networks. With this fact in mind, we aim to exploit the network information to recognize patterns of hidden confounders which would further allow us to learn valid individual causal effects from observational data. In this work, we propose a novel causal inference framework, the network deconfounder, which learns representations to unravel patterns of hidden confounders from the network information. Empirically, we perform extensive experiments to validate the effectiveness of the network deconfounder on various datasets.
The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. Recent work exploits stability of regression coefficients or invariance properti es of models across different experimental conditions for reconstructing the full causal graph. These approaches generally do not scale well with the number of the explanatory variables and are difficult to extend to nonlinear relationships. Contrary to existing work, we propose an approach which even works for observational data alone, while still offering theoretical guarantees including the case of partially nonlinear relationships. Our algorithm requires only one estimation for each variable and in our experiments we apply our causal discovery algorithm even to large graphs, demonstrating significant improvements compared to well established approaches.
Graphical causal models are an important tool for knowledge discovery because they can represent both the causal relations between variables and the multivariate probability distributions over the data. Once learned, causal graphs can be used for cla ssification, feature selection and hypothesis generation, while revealing the underlying causal network structure and thus allowing for arbitrary likelihood queries over the data. However, current algorithms for learning sparse directed graphs are generally designed to handle only one type of data (continuous-only or discrete-only), which limits their applicability to a large class of multi-modal biological datasets that include mixed type variables. To address this issue, we developed new methods that modify and combine existing methods for finding undirected graphs with methods for finding directed graphs. These hybrid methods are not only faster, but also perform better than the directed graph estimation methods alone for a variety of parameter settings and data set sizes. Here, we describe a new conditional independence test for learning directed graphs over mixed data types and we compare performances of different graph learning strategies on synthetic data.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا