A conditional independence framework for coherent modularized inference

127 0 0.0 ( 0 )

Download Cite

Added by Manuele Leonelli

Publication date 2018

fields Mathematical Statistics

and research's language is English

Authors Manuele Leonelli - Martine J. Barons - Jim Q. Smith

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Inference in current domains of application are often complex and require us to integrate the expertise of a variety of disparate panels of experts and models coherently. In this paper we develop a formal statistical methodology to guide the networking together of a diverse collection of probabilistic models. In particular, we derive sufficient conditions that ensure inference remains coherent across the composite before and after accommodating relevant evidence.

rate research

The conditional permutation test for independence while controlling for confounders

114 - Thomas B. Berrett , Yi Wang , Rina Foygel Barber 2018

We propose a general new method, the conditional permutation test, for testing the conditional independence of variables $X$ and $Y$ given a potentially high-dimensional random vector $Z$ that may contain confounding factors. The proposed test permutes entries of $X$ non-uniformly, so as to respect the existing dependence between $X$ and $Z$ and thus account for the presence of these confounders. Like the conditional randomization test of Cand`es et al. (2018), our test relies on the availability of an approximation to the distribution of $X mid Z$. While Cand`es et al. (2018)s test uses this estimate to draw new $X$ values, for our test we use this approximation to design an appropriate non-uniform distribution on permutations of the $X$ values already seen in the true data. We provide an efficient Markov Chain Monte Carlo sampler for the implementation of our method, and establish bounds on the Type I error in terms of the error in the approximation of the conditional distribution of $Xmid Z$, finding that, for the worst case test statistic, the inflation in Type I error of the conditional permutation test is no larger than that of the conditional randomization test. We validate these theoretical results with experiments on simulated data and on the Capital Bikeshare data set.

Methodology Statistics Theory Statistics Theory

Testing Conditional Independence in Supervised Learning Algorithms

61 - David S. Watson , Marvin N. Wright 2019

We propose the conditional predictive impact (CPI), a consistent and unbiased estimator of the association between one or several features and a given outcome, conditional on a reduced feature set. Building on the knockoff framework of Cand`es et al. (2018), we develop a novel testing procedure that works in conjunction with any valid knockoff sampler, supervised learning algorithm, and loss function. The CPI can be efficiently computed for high-dimensional data without any sparsity constraints. We demonstrate convergence criteria for the CPI and develop statistical inference procedures for evaluating its magnitude, significance, and precision. These tests aid in feature and model selection, extending traditional frequentist and Bayesian techniques to general supervised learning tasks. The CPI may also be applied in causal discovery to identify underlying multivariate graph structures. We test our method using various algorithms, including linear regression, neural networks, random forests, and support vector machines. Empirical results show that the CPI compares favorably to alternative variable importance measures and other nonparametric tests of conditional independence on a diverse array of real and simulated datasets. Simulations confirm that our inference procedures successfully control Type I error and achieve nominal coverage probability. Our method has been implemented in an R package, cpi, which can be downloaded from https://github.com/dswatson/cpi.

Methodology Machine Learning Machine Learning

Statistical Inference for distributions with one Poisson conditional

99 - Barry C. Arnold , B.G. Manjunath 2020

It will be recalled that the classical bivariate normal distributions have normal marginals and normal conditionals. It is natural to ask whether a similar phenomenon can be encountered involving Poisson marginals and conditionals. Reference to Arnold, Castillo and Sarabias (1999) book on conditionally specified models will confirm that Poisson marginals will be encountered, together with both conditionals being of the Poisson form, only in the case in which the variables are independent. Instead, in the present article we will be focusing on bivariate distributions with one marginal and the other family of conditionals being of the Poisson form. Such distributions are called Pseudo-Poisson distributions. We discuss distributional features of such models, explore inferential aspects and include an example of applications of the Pseudo-Poisson model to sets of over-dispersed data.

Methodology Applications

Diagnostics for Conditional Density Models and Bayesian Inference Algorithms

193 - David Zhao , Niccol`o Dalmasso , Rafael Izbicki 2021

There has been growing interest in the AI community for precise uncertainty quantification. Conditional density models f(y|x), where x represents potentially high-dimensional features, are an integral part of uncertainty quantification in prediction and Bayesian inference. However, it is challenging to assess conditional density estimates and gain insight into modes of failure. While existing diagnostic tools can determine whether an approximated conditional density is compatible overall with a data sample, they lack a principled framework for identifying, locating, and interpreting the nature of statistically significant discrepancies over the entire feature space. In this paper, we present rigorous and easy-to-interpret diagnostics such as (i) the Local Coverage Test (LCT), which distinguishes an arbitrarily misspecified model from the true conditional density of the sample, and (ii) Amortized Local P-P plots (ALP) which can quickly provide interpretable graphical summaries of distributional differences at any location x in the feature space. Our validation procedures scale to high dimensions and can potentially adapt to any type of data at hand. We demonstrate the effectiveness of LCT and ALP through a simulated experiment and applications to prediction and parameter inference for image data.

Methodology

A Framework for Inferring Causality from Multi-Relational Observational Data using Conditional Independence

319 - Sudeepa Roy , Babak Salimi 2017

The study of causality or causal inference - how much a given treatment causally affects a given outcome in a population - goes way beyond correlation or association analysis of variables, and is critical in making sound data driven decisions and policies in a multitude of applications. The gold standard in causal inference is performing controlled experiments, which often is not possible due to logistical or ethical reasons. As an alternative, inferring causality on observational data based on the Neyman-Rubin potential outcome model has been extensively used in statistics, economics, and social sciences over several decades. In this paper, we present a formal framework for sound causal analysis on observational datasets that are given as multiple relations and where the population under study is obtained by joining these base relations. We study a crucial condition for inferring causality from observational data, called the strong ignorability assumption (the treatment and outcome variables should be independent in the joined relation given the observed covariates), using known conditional independences that hold in the base relations. We also discuss how the structure of the conditional independences in base relations given as graphical models help infer new conditional independences in the joined relation. The proposed framework combines concepts from databases, statistics, and graphical models, and aims to initiate new research directions spanning these fields to facilitate powerful data-driven decisions in todays big data world.

Databases Artificial Intelligence