Do you want to publish a course? Click here

Graph Independence Testing

88   0   0.0 ( 0 )
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Identifying statistically significant dependency between variables is a key step in scientific discoveries. Many recent methods, such as distance and kernel tests, have been proposed for valid and consistent independence testing and can be applied to data in Euclidean and non-Euclidean spaces. However, in those works, $n$ pairs of points in $mathcal{X} times mathcal{Y}$ are observed. Here, we consider the setting where a pair of $n times n$ graphs are observed, and the corresponding adjacency matrices are treated as kernel matrices. Under a $rho$-correlated stochastic block model, we demonstrate that a naive test (permutation and Pearsons) for a conditional dependency graph model is invalid. Instead, we propose a block-permutation procedure. We prove that our procedure is valid and consistent -- even when the two graphs have different marginal distributions, are weighted or unweighted, and the latent vertex assignments are unknown -- and provide sufficient conditions for the tests to estimate $rho$. Simulations corroborate these results on both binary and weighted graphs. Applying these tests to the whole-organism, single-cell-resolution structural connectomes of C. elegans, we identify strong statistical dependency between the chemical synapse connectome and the gap junction connectome.



rate research

Read More

We propose the conditional predictive impact (CPI), a consistent and unbiased estimator of the association between one or several features and a given outcome, conditional on a reduced feature set. Building on the knockoff framework of Cand`es et al. (2018), we develop a novel testing procedure that works in conjunction with any valid knockoff sampler, supervised learning algorithm, and loss function. The CPI can be efficiently computed for high-dimensional data without any sparsity constraints. We demonstrate convergence criteria for the CPI and develop statistical inference procedures for evaluating its magnitude, significance, and precision. These tests aid in feature and model selection, extending traditional frequentist and Bayesian techniques to general supervised learning tasks. The CPI may also be applied in causal discovery to identify underlying multivariate graph structures. We test our method using various algorithms, including linear regression, neural networks, random forests, and support vector machines. Empirical results show that the CPI compares favorably to alternative variable importance measures and other nonparametric tests of conditional independence on a diverse array of real and simulated datasets. Simulations confirm that our inference procedures successfully control Type I error and achieve nominal coverage probability. Our method has been implemented in an R package, cpi, which can be downloaded from https://github.com/dswatson/cpi.
Two-sample and independence tests with the kernel-based MMD and HSIC have shown remarkable results on i.i.d. data and stationary random processes. However, these statistics are not directly applicable to non-stationary random processes, a prevalent form of data in many scientific disciplines. In this work, we extend the application of MMD and HSIC to non-stationary settings by assuming access to independent realisations of the underlying random process. These realisations - in the form of non-stationary time-series measured on the same temporal grid - can then be viewed as i.i.d. samples from a multivariate probability distribution, to which MMD and HSIC can be applied. We further show how to choose suitable kernels over these high-dimensional spaces by maximising the estimated test power with respect to the kernel hyper-parameters. In experiments on synthetic data, we demonstrate superior performance of our proposed approaches in terms of test power when compared to current state-of-the-art functional or multivariate two-sample and independence tests. Finally, we employ our methods on a real socio-economic dataset as an example application.
Hierarchical inference in (generalized) regression problems is powerful for finding significant groups or even single covariates, especially in high-dimensional settings where identifiability of the entire regression parameter vector may be ill-posed. The general method proceeds in a fully data-driven and adaptive way from large to small groups or singletons of covariates, depending on the signal strength and the correlation structure of the design matrix. We propose a novel hierarchical multiple testing adjustment that can be used in combination with any significance test for a group of covariates to perform hierarchical inference. Our adjustment passes on the significance level of certain hypotheses that could not be rejected and is shown to guarantee strong control of the familywise error rate. Our method is at least as powerful as a so-called depth-wise hierarchical Bonferroni adjustment. It provides a substantial gain in power over other previously proposed inheritance hierarchical procedures if the underlying alternative hypotheses occur sparsely along a few branches in the tree-structured hierarchy.
Manufacturers are required to demonstrate products meet reliability targets. A typical way to achieve this is with reliability demonstration tests (RDTs), in which a number of products are put on test and the test is passed if a target reliability is achieved. There are various methods for determining the sample size for RDTs, typically based on the power of a hypothesis test following the RDT or risk criteria. Bayesian risk criteria approaches can conflate the choice of sample size and the analysis to be undertaken once the test has been conducted and rely on the specification of somewhat artificial acceptable and rejectable reliability levels. In this paper we offer an alternative approach to sample size determination based on the idea of assurance. This approach chooses the sample size to answer provide a certain probability that the RDT will result in a successful outcome. It separates the design and analysis of the RDT, allowing different priors for each. We develop the assurance approach for sample size calculations in RDTs for binomial and Weibull likelihoods and propose appropriate prior distributions for the design and analysis of the test. In each case, we illustrate the approach with an example based on real data.
Rank-order relational data, in which each actor ranks the others according to some criterion, often arise from sociometric measurements of judgment (e.g., self-reported interpersonal interaction) or preference (e.g., relative liking). We propose a class of exponential-family models for rank-order relational data and derive a new class of sufficient statistics for such data, which assume no more than within-subject ordinal properties. Application of MCMC MLE to this family allows us to estimate effects for a variety of plausible mechanisms governing rank structure in cross-sectional context, and to model the evolution of such structures over time. We apply this framework to model the evolution of relative liking judgments in an acquaintance process, and to model recall of relative volume of interpersonal interaction among members of a technology education program.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا