No Arabic abstract
Our goal is to estimate causal interactions in multivariate time series. Using vector autoregressive (VAR) models, these can be defined based on non-vanishing coefficients belonging to respective time-lagged instances. As in most cases a parsimonious causality structure is assumed, a promising approach to causal discovery consists in fitting VAR models with an additional sparsity-promoting regularization. Along this line we here propose that sparsity should be enforced for the subgroups of coefficients that belong to each pair of time series, as the absence of a causal relation requires the coefficients for all time-lags to become jointly zero. Such behavior can be achieved by means of l1-l2-norm regularized regression, for which an efficient active set solver has been proposed recently. Our method is shown to outperform standard methods in recovering simulated causality graphs. The results are on par with a second novel approach which uses multiple statistical testing.
Identifying causal relationships is a challenging yet crucial problem in many fields of science like epidemiology, climatology, ecology, genomics, economics and neuroscience, to mention only a few. Recent studies have demonstrated that ordinal partition transition networks (OPTNs) allow inferring the coupling direction between two dynamical systems. In this work, we generalize this concept to the study of the interactions among multiple dynamical systems and we propose a new method to detect causality in multivariate observational data. By applying this method to numerical simulations of coupled linear stochastic processes as well as two examples of interacting nonlinear dynamical systems (coupled Lorenz systems and a network of neural mass models), we demonstrate that our approach can reliably identify the direction of interactions and the associated coupling delays. Finally, we study real-world observational microelectrode array electrophysiology data from rodent brain slices to identify the causal coupling structures underlying epileptiform activity. Our results, both from simulations and real-world data, suggest that OPTNs can provide a complementary and robust approach to infer causal effect networks from multivariate observational data.
This paper introduces a unified framework of counterfactual estimation for time-series cross-sectional data, which estimates the average treatment effect on the treated by directly imputing treated counterfactuals. Examples include the fixed effects counterfactual estimator, interactive fixed effects counterfactual estimator, and matrix completion estimator. These estimators provide more reliable causal estimates than conventional twoway fixed effects models when treatment effects are heterogeneous or unobserved time-varying confounders exist. Under this framework, we propose a new dynamic treatment effects plot, as well as several diagnostic tests, to help researchers gauge the validity of the identifying assumptions. We illustrate these methods with two political economy examples and develop an open-source package, fect, in both R and Stata to facilitate implementation.
Platelet products are both expensive and have very short shelf lives. As usage rates for platelets are highly variable, the effective management of platelet demand and supply is very important yet challenging. The primary goal of this paper is to present an efficient forecasting model for platelet demand at Canadian Blood Services (CBS). To accomplish this goal, four different demand forecasting methods, ARIMA (Auto Regressive Moving Average), Prophet, lasso regression (least absolute shrinkage and selection operator) and LSTM (Long Short-Term Memory) networks are utilized and evaluated. We use a large clinical dataset for a centralized blood distribution centre for four hospitals in Hamilton, Ontario, spanning from 2010 to 2018 and consisting of daily platelet transfusions along with information such as the product specifications, the recipients characteristics, and the recipients laboratory test results. This study is the first to utilize different methods from statistical time series models to data-driven regression and a machine learning technique for platelet transfusion using clinical predictors and with different amounts of data. We find that the multivariate approaches have the highest accuracy in general, however, if sufficient data are available, a simpler time series approach such as ARIMA appears to be sufficient. We also comment on the approach to choose clinical indicators (inputs) for the multivariate models.
Modern RNA sequencing technologies provide gene expression measurements from single cells that promise refined insights on regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often show zero-inflated expression patterns. To address this challenge, we propose directed graphical models that are based on Hurdle conditional distributions parametrized in terms of polynomials in parent variables and their 0/1 indicators of being zero or nonzero. While directed graphs for Gaussian models are only identifiable up to an equivalence class in general, we show that, under a natural and weak assumption, the exact directed acyclic graph of our zero-inflated models can be identified. We propose methods for graph recovery, apply our model to real single-cell RNA-seq data on T helper cells, and show simulated experiments that validate the identifiability and graph estimation methods in practice.
A general Bayesian framework is introduced for mixture modelling and inference with real-valued time series. At the top level, the state space is partitioned via the choice of a discrete context tree, so that the resulting partition depends on the values of some of the most recent samples. At the bottom level, a different model is associated with each region of the partition. This defines a very rich and flexible class of mixture models, for which we provide algorithms that allow for efficient, exact Bayesian inference. In particular, we show that the maximum a posteriori probability (MAP) model (including the relevant MAP context tree partition) can be precisely identified, along with its exact posterior probability. The utility of this general framework is illustrated in detail when a different autoregressive (AR) model is used in each state-space region, resulting in a mixture-of-AR model class. The performance of the associated algorithmic tools is demonstrated in the problems of model selection and forecasting on both simulated and real-world data, where they are found to provide results as good or better than state-of-the-art methods.