ترغب بنشر مسار تعليمي؟ اضغط هنا

Valid Two-Sample Graph Testing via Optimal Transport Procrustes and Multiscale Graph Correlation with Applications in Connectomics

119   0   0.0 ( 0 )
 نشر من قبل Jaewon Chung
 تاريخ النشر 2019
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Testing whether two graphs come from the same distribution is of interest in many real world scenarios, including brain network analysis. Under the random dot product graph model, the nonparametric hypothesis testing frame-work consists of embedding the graphs using the adjacency spectral embedding (ASE), followed by aligning the embeddings using the median flip heuristic, and finally applying the nonparametric maximum mean discrepancy(MMD) test to obtain a p-value. Using synthetic data generated from Drosophila brain networks, we show that the median flip heuristic results in an invalid test, and demonstrate that optimal transport Procrustes (OTP) for alignment resolves the invalidity. We further demonstrate that substituting the MMD test with multiscale graph correlation(MGC) test leads to a more powerful test both in synthetic and in simulated data. Lastly, we apply this powerful test to the right and left hemispheres of the larval Drosophila mushroom body brain networks, and conclude that there is not sufficient evidence to reject the null hypothesis that the two hemispheres are equally distributed.



قيم البحث

اقرأ أيضاً

Identifying statistically significant dependency between variables is a key step in scientific discoveries. Many recent methods, such as distance and kernel tests, have been proposed for valid and consistent independence testing and can be applied to data in Euclidean and non-Euclidean spaces. However, in those works, $n$ pairs of points in $mathcal{X} times mathcal{Y}$ are observed. Here, we consider the setting where a pair of $n times n$ graphs are observed, and the corresponding adjacency matrices are treated as kernel matrices. Under a $rho$-correlated stochastic block model, we demonstrate that a naive test (permutation and Pearsons) for a conditional dependency graph model is invalid. Instead, we propose a block-permutation procedure. We prove that our procedure is valid and consistent -- even when the two graphs have different marginal distributions, are weighted or unweighted, and the latent vertex assignments are unknown -- and provide sufficient conditions for the tests to estimate $rho$. Simulations corroborate these results on both binary and weighted graphs. Applying these tests to the whole-organism, single-cell-resolution structural connectomes of C. elegans, we identify strong statistical dependency between the chemical synapse connectome and the gap junction connectome.
Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age. In this paper, we establish a new framework that generalizes distance correlation --- a correlation measure that was recently proposed and shown to be universally consistent for dependence testing against all joint distributions of finite moments --- to the Multiscale Graph Correlation (MGC). By utilizing the characteristic functions and incorporating the nearest neighbor machinery, we formalize the population version of local distance correlations, define the optimal scale in a given dependency, and name the optimal local correlation as MGC. The new theoretical framework motivates a theoretically sound Sample MGC and allows a number of desirable properties to be proved, including the universal consistency, convergence and almost unbiasedness of the sample version. The advantages of MGC are illustrated via a comprehensive set of simulations with linear, nonlinear, univariate, multivariate, and noisy dependencies, where it loses almost no power in monotone dependencies while achieving better performance in general dependencies, compared to distance correlation and other popular methods.
Though the multiscale graph learning techniques have enabled advanced feature extraction frameworks, the classic ensemble strategy may show inferior performance while encountering the high homogeneity of the learnt representation, which is caused by the nature of existing graph pooling methods. To cope with this issue, we propose a diversified multiscale graph learning model equipped with two core ingredients: a graph self-correction (GSC) mechanism to generate informative embedded graphs, and a diversity boosting regularizer (DBR) to achieve a comprehensive characterization of the input graph. The proposed GSC mechanism compensates the pooled graph with the lost information during the graph pooling process by feeding back the estimated residual graph, which serves as a plug-in component for popular graph pooling methods. Meanwhile, pooling methods enhanced with the GSC procedure encourage the discrepancy of node embeddings, and thus it contributes to the success of ensemble learning strategy. The proposed DBR instead enhances the ensemble diversity at the graph-level embeddings by leveraging the interaction among individual classifiers. Extensive experiments on popular graph classification benchmarks show that the proposed GSC mechanism leads to significant improvements over state-of-the-art graph pooling methods. Moreover, the ensemble multiscale graph learning models achieve superior enhancement by combining both GSC and DBR.
A connectome is a map of the structural and/or functional connections in the brain. This information-rich representation has the potential to transform our understanding of the relationship between patterns in brain connectivity and neurological proc esses, disorders, and diseases. However, existing computational techniques used to analyze connectomes are often insufficient for interrogating multi-subject connectomics datasets. Several methods are either solely designed to analyze single connectomes, or leverage heuristic graph invariants that ignore the complete topology of connections between brain regions. To enable more rigorous comparative connectomics analysis, we introduce robust and interpretable statistical methods motivated by recent theoretical advances in random graph models. These methods enable simultaneous analysis of multiple connectomes across different scales of network topology, facilitating the discovery of hierarchical brain structures that vary in relation with phenotypic profiles. We validated these methods through extensive simulation studies, as well as synthetic and real-data experiments. Using a set of high-resolution connectomes obtained from genetically distinct mouse strains (including the BTBR mouse -- a standard model of autism -- and three behavioral wild-types), we show that these methods uncover valuable latent information in multi-subject connectomics data and yield novel insights into the connective correlates of neurological phenotypes.
188 - Ilmun Kim , Ann B. Lee , Jing Lei 2018
Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature, there have be en recent methodological developments such as classification accuracy tests. The goal of this work is to present a regression approach to comparing multivariate distributions of complex data. Depending on the chosen regression model, our framework can efficiently handle different types of variables and various structures in the data, with competitive power under many practical scenarios. Whereas previous work has been largely limited to global tests which conceal much of the local information, our approach naturally leads to a local two-sample testing framework in which we identify local differences between multivariate distributions with statistical confidence. We demonstrate the efficacy of our approach both theoretically and empirically, under some well-known parametric and nonparametric regression methods. Our proposed methods are applied to simulated data as well as a challenging astronomy data set to assess their practical usefulness.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا