ﻻ يوجد ملخص باللغة العربية
In this paper, we consider data consisting of multiple networks, each comprised of a different edge set on a common set of nodes. Many models have been proposed for the analysis of such multi-view network data under the assumption that the data views are closely related. In this paper, we provide tools for evaluating this assumption. In particular, we ask: given two networks that each follow a stochastic block model, is there an association between the latent community memberships of the nodes in the two networks? To answer this question, we extend the stochastic block model for a single network view to the two-view setting, and develop a new hypothesis test for the null hypothesis that the latent community memberships in the two data views are independent. We apply our test to protein-protein interaction data from the HINT database (Das and Hint, 2012). We find evidence of a weak association between the latent community memberships of proteins defined with respect to binary interaction data and the latent community memberships of proteins defined with respect to co-complex association data. We also extend this proposal to the setting of a network with node covariates.
We present a general framework for hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in time series, collections of words in text or a batch of images of a g
The practice of pooling several individual test statistics to form aggregate tests is common in many statistical application where individual tests may be underpowered. While selection by aggregate tests can serve to increase power, the selection pro
HIV-1C is the most prevalent subtype of HIV-1 and accounts for over half of HIV-1 infections worldwide. Host genetic influence of HIV infection has been previously studied in HIV-1B, but little attention has been paid to the more prevalent subtype C.
In genome-wide association studies (GWAS), penalization is an important approach for identifying genetic markers associated with trait while mixed model is successful in accounting for a complicated dependence structure among samples. Therefore, pena
Consider the online testing of a stream of hypotheses where a real--time decision must be made before the next data point arrives. The error rate is required to be controlled at {all} decision points. Conventional emph{simultaneous testing rules} are