Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data

81 0 0.0 ( 0 )

Download Cite

Added by Andrew Lawrence

Publication date 2021

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Andrew R. Lawrence - Marcus Kaiser - Rui Sampaio

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Going beyond correlations, the understanding and identification of causal relationships in observational time series, an important subfield of Causal Discovery, poses a major challenge. The lack of access to a well-defined ground truth for real-world data creates the need to rely on synthetic data for the evaluation of these methods. Existing benchmarks are limited in their scope, as they either are restricted to a static selection of data sets, or do not allow for a granular assessment of the methods performance when commonly made assumptions are violated. We propose a flexible and simple to use framework for generating time series data, which is aimed at developing, evaluating, and benchmarking time series causal discovery methods. In particular, the framework can be used to fine tune novel methods on vast amounts of data, without overfitting them to a benchmark, but rather so they perform well in real-world use cases. Using our framework, we evaluate prominent time series causal discovery methods and demonstrate a notable degradation in performance when their assumptions are invalidated and their sensitivity to choice of hyperparameters. Finally, we propose future research directions and how our framework can support both researchers and practitioners.

rate research

Probabilistic structure discovery in time series data

75 - David Janz , Brooks Paige , Tom Rainforth 2016

Existing methods for structure discovery in time series data construct interpretable, compositional kernels for Gaussian process regression models. While the learned Gaussian process model provides posterior mean and variance estimates, typically the structure is learned via a greedy optimization procedure. This restricts the space of possible solutions and leads to over-confident uncertainty estimates. We introduce a fully Bayesian approach, inferring a full posterior over structures, which more reliably captures the uncertainty of the model.

Machine Learning Machine Learning

Generating Synthetic Text Data to Evaluate Causal Inference Methods

110 - Zach Wood-Doughty , Ilya Shpitser , Mark Dredze 2021

Drawing causal conclusions from observational data requires making assumptions about the true data-generating process. Causal inference research typically considers low-dimensional data, such as categorical or numerical fields in structured medical records. High-dimensional and unstructured data such as natural language complicates the evaluation of causal inference methods; such evaluations rely on synthetic datasets with known causal effects. Models for natural language generation have been widely studied and perform well empirically. However, existing methods not immediately applicable to producing synthetic datasets for causal evaluations, as they do not allow for quantifying a causal effect on the text itself. In this work, we develop a framework for adapting existing generation models to produce synthetic text datasets with known causal effects. We use this framework to perform an empirical comparison of four recently-proposed methods for estimating causal effects from text data. We release our code and synthetic datasets.

Computation and Language

Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data

106 - Sindy Lowe , David Madras , Richard Zemel 2020

Standard causal discovery methods must fit a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information - for instance, the dynamics describing the effects of causal relations - which is lost when following this approach. We propose Amortized Causal Discovery, a novel framework that leverages such shared dynamics to learn to infer causal relations from time-series data. This enables us to train a single, amortized model that infers causal relations across samples with different underlying causal graphs, and thus makes use of the information that is shared. We demonstrate experimentally that this approach, implemented as a variational model, leads to significant improvements in causal discovery performance, and show how it can be extended to perform well under hidden confounding.

Machine Learning Machine Learning

COT-GAN: Generating Sequential Data via Causal Optimal Transport

185 - Tianlin Xu , Li K. Wenliang , Michael Munn 2020

We introduce COT-GAN, an adversarial algorithm to train implicit generative models optimized for producing sequential data. The loss function of this algorithm is formulated using ideas from Causal Optimal Transport (COT), which combines classic optimal transport methods with an additional temporal causality constraint. Remarkably, we find that this causality condition provides a natural framework to parameterize the cost function that is learned by the discriminator as a robust (worst-case) distance, and an ideal mechanism for learning time dependent data distributions. Following Genevay et al. (2018), we also include an entropic penalization term which allows for the use of the Sinkhorn algorithm when computing the optimal transport cost. Our experiments show effectiveness and stability of COT-GAN when generating both low- and high-dimensional time series data. The success of the algorithm also relies on a new, improved version of the Sinkhorn divergence which demonstrates less bias in learning.

Machine Learning Machine Learning

Orthogonal Structure Search for Efficient Causal Discovery from Observational Data

86 - Anant Raj , Luigi Gresele , Michel Besserve 2019

The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. Recent work exploits stability of regression coefficients or invariance properties of models across different experimental conditions for reconstructing the full causal graph. These approaches generally do not scale well with the number of the explanatory variables and are difficult to extend to nonlinear relationships. Contrary to existing work, we propose an approach which even works for observational data alone, while still offering theoretical guarantees including the case of partially nonlinear relationships. Our algorithm requires only one estimation for each variable and in our experiments we apply our causal discovery algorithm even to large graphs, demonstrating significant improvements compared to well established approaches.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions