No Arabic abstract
Estimating the Individual Treatment Effect from observational data, defined as the difference between outcomes with and without treatment or intervention, while observing just one of both, is a challenging problems in causal learning. In this paper, we formulate this problem as an inference from hidden variables and enforce causal constraints based on a model of four exclusive causal populations. We propose a new version of the EM algorithm, coined as Expected-Causality-Maximization (ECM) algorithm and provide hints on its convergence under mild conditions. We compare our algorithm to baseline methods on synthetic and real-world data and discuss its performances.
Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals. However, typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable. Often, this requirement is satisfied by simply collecting many variables, at the expense of increased sample complexity for estimating CATEs. To combat this, we propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function. With our EBM we introduce a preprocessing step that alleviates the dimensionality curse for any existing model and learner developed for estimating CATE. We prove that our EBM keeps the representations partially identifiable up to some universal constant, as well as having universal approximation capability to avoid excessive information loss from model misspecification; these properties combined with our loss function, enable the representations to converge and keep the CATE estimation consistent. Experiments demonstrate the convergence of the representations, as well as show that estimating CATEs on our representations performs better than on the variables or the representations obtained via various benchmark dimensionality reduction methods.
Forest-based methods have recently gained in popularity for non-parametric treatment effect estimation. Building on this line of work, we introduce causal survival forests, which can be used to estimate heterogeneous treatment effects in a survival and observational setting where outcomes may be right-censored. Our approach relies on orthogonal estimating equations to robustly adjust for both censoring and selection effects. In our experiments, we find our approach to perform well relative to a number of baselines.
We study the problem of estimating the continuous response over time to interventions using observational time series---a retrospective dataset where the policy by which the data are generated is unknown to the learner. We are motivated by applications where response varies by individuals and therefore, estimating responses at the individual-level is valuable for personalizing decision-making. We refer to this as the problem of estimating individualized treatment response (ITR) curves. In statistics, G-computation formula (Robins, 1986) has been commonly used for estimating treatment responses from observational data containing sequential treatment assignments. However, past studies have focused predominantly on obtaining point-in-time estimates at the population level. We leverage the G-computation formula and develop a novel Bayesian nonparametric (BNP) method that can flexibly model functional data and provide posterior inference over the treatment response curves at both the individual and population level. On a challenging dataset containing time series from patients admitted to a hospital, we estimate responses to treatments used in managing kidney function and show that the resulting fits are more accurate than alternative approaches. Accurate methods for obtaining ITRs from observational data can dramatically accelerate the pace at which personalized treatment plans become possible.
Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $mathbf{Y}$ and $mathbf{Z}$, GIN holds if and only if $omega^{intercal}mathbf{Y}$ and $mathbf{Z}$ are statistically independent, where $omega$ is a parameter vector characterized from the cross-covariance between $mathbf{Y}$ and $mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $mathbf{Y}$ d-separate $mathbf{Y}$ from $mathbf{Z}$. Interestingly, we find that the independent noise condition, i.e., if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.
Selecting causal inference models for estimating individualized treatment effects (ITE) from observational data presents a unique challenge since the counterfactual outcomes are never observed. The problem is challenged further in the unsupervised domain adaptation (UDA) setting where we only have access to labeled samples in the source domain, but desire selecting a model that achieves good performance on a target domain for which only unlabeled samples are available. Existing techniques for UDA model selection are designed for the predictive setting. These methods examine discriminative density ratios between the input covariates in the source and target domain and do not factor in the models predictions in the target domain. Because of this, two models with identical performance on the source domain would receive the same risk score by existing methods, but in reality, have significantly different performance in the test domain. We leverage the invariance of causal structures across domains to propose a novel model selection metric specifically designed for ITE methods under the UDA setting. In particular, we propose selecting models whose predictions of interventions effects satisfy known causal structures in the target domain. Experimentally, our method selects ITE models that are more robust to covariate shifts on several healthcare datasets, including estimating the effect of ventilation in COVID-19 patients from different geographic locations.