A semiparametric two-sample hypothesis testing problem for random dot product graphs

358 0 0.0 ( 0 )

Download Cite

Added by Minh Tang

Publication date 2014

fields Mathematical Statistics

and research's language is English

Authors Minh Tang - Avanti Athreya - Daniel L. Sussman

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Two-sample hypothesis testing for random graphs arises naturally in neuroscience, social networks, and machine learning. In this paper, we consider a semiparametric problem of two-sample hypothesis testing for a class of latent position random graphs. We formulate a notion of consistency in this context and propose a valid test for the hypothesis that two finite-dimensional random dot product graphs on a common vertex set have the same generating latent positions or have generating latent positions that are scaled or diagonal transformations of one another. Our test statistic is a function of a spectral decomposition of the adjacency matrix for each graph and our test procedure is consistent across a broad range of alternatives. We apply our test procedure to real biological data: in a test-retest data set of neural connectome graphs, we are able to distinguish between scans from different subjects; and in the {em C.elegans} connectome, we are able to distinguish between chemical and electrical networks. The latter example is a concrete demonstration that our test can have power even for small sample sizes. We conclude by discussing the relationship between our test procedure and generalized likelihood ratio tests.

rate research

Missing at Random or Not: A Semiparametric Testing Approach

103 - Rui Duan , C. Jason Liang , Pamela Shaw 2020

Practical problems with missing data are common, and statistical methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism governing data missingness, and correctly deciding the appropriate mechanism is crucially relevant for conducting proper practical investigations. The conventional notions include the three common potential classes -- missing completely at random, missing at random, and missing not at random. In this paper, we present a new hypothesis testing approach for deciding between missing at random and missing not at random. Since the potential alternatives of missing at random are broad, we focus our investigation on a general class of models with instrumental variables for data missing not at random. Our setting is broadly applicable, thanks to that the model concerning the missing data is nonparametric, requiring no explicit model specification for the data missingness. The foundational idea is to develop appropriate discrepancy measures between estimators whose properties significantly differ only when missing at random does not hold. We show that our new hypothesis testing approach achieves an objective data oriented choice between missing at random or not. We demonstrate the feasibility, validity, and efficacy of the new test by theoretical analysis, simulation studies, and a real data analysis.

Methodology Econometrics

Optimal Bayesian Estimation for Random Dot Product Graphs

80 - Fangzheng Xie , Yanxun Xu 2019

We propose a Bayesian approach, called the posterior spectral embedding, for estimating the latent positions in random dot product graphs, and prove its optimality. Unlike the classical spectral-based adjacency/Laplacian spectral embedding, the posterior spectral embedding is a fully-likelihood based graph estimation method taking advantage of the Bernoulli likelihood information of the observed adjacency matrix. We develop a minimax-lower bound for estimating the latent positions, and show that the posterior spectral embedding achieves this lower bound since it both results in a minimax-optimal posterior contraction rate, and yields a point estimator achieving the minimax risk asymptotically. The convergence results are subsequently applied to clustering in stochastic block models, the result of which strengthens an existing result concerning the number of mis-clustered vertices. We also study a spectral-based Gaussian spectral embedding as a natural Bayesian analogy of the adjacency spectral embedding, but the resulting posterior contraction rate is sub-optimal with an extra logarithmic factor. The practical performance of the proposed methodology is illustrated through extensive synthetic examples and the analysis of a Wikipedia graph data.

Statistics Theory Methodology Statistics Theory

Efficient Estimation for Random Dot Product Graphs via a One-step Procedure

62 - Fangzheng Xie , Yanxun Xu 2019

We propose a one-step procedure to estimate the latent positions in random dot product graphs efficiently. Unlike the classical spectral-based methods such as the adjacency and Laplacian spectral embedding, the proposed one-step procedure takes advantage of both the low-rank structure of the expected adjacency matrix and the Bernoulli likelihood information of the sampling model simultaneously. We show that for each vertex, the corresponding row of the one-step estimator converges to a multivariate normal distribution after proper scaling and centering up to an orthogonal transformation, with an efficient covariance matrix. The initial estimator for the one-step procedure needs to satisfy the so-called approximate linearization property. The one-step estimator improves the commonly-adopted spectral embedding methods in the following sense: Globally for all vertices, it yields an asymptotic sum of squares error no greater than those of the spectral methods, and locally for each vertex, the asymptotic covariance matrix of the corresponding row of the one-step estimator dominates those of the spectral embeddings in spectra. The usefulness of the proposed one-step procedure is demonstrated via numerical examples and the analysis of a real-world Wikipedia graph dataset.

Statistics Theory Methodology Statistics Theory

Combined Hypothesis Testing on Graphs with Applications to Gene Set Enrichment Analysis

115 - Shulei Wang , Ming Yuan 2016

Motivated by gene set enrichment analysis, we investigate the problem of combined hypothesis testing on a graph. We introduce a general framework to effectively use the structural information of the underlying graph when testing multivariate means. A new testing procedure is proposed within this framework. We show that the test is optimal in that it can consistently detect departure from the collective null at a rate that no other test could improve, for almost all graphs. We also provide general performance bounds for the proposed test under any specific graph, and illustrate their utility through several common types of graphs. Numerical experiments are presented to further demonstrate the merits of our approach.

Methodology Statistics Theory Applications

A Unified Approach to Hypothesis Testing for Functional Linear Models

102 - Yinan Lin , Zhenhua Lin 2021

We develop a unified approach to hypothesis testing for various types of widely used functional linear models, such as scalar-on-function, function-on-function and function-on-scalar models. In addition, the proposed test applies to models of mixed types, such as models with both functional and scalar predictors. In contrast with most existing methods that rest on the large-sample distributions of test statistics, the proposed method leverages the technique of bootstrapping max statistics and exploits the variance decay property that is an inherent feature of functional data, to improve the empirical power of tests especially when the sample size is limited and the signal is relatively weak. Theoretical guarantees on the validity and consistency of the proposed test are provided uniformly for a class of test statistics.

Methodology Statistics Theory Statistics Theory