No Arabic abstract
Graphs representing real world systems may be studied from their underlying community structure. A community in a network is an intuitive idea for which there is no consensus on its objective mathematical definition. The most used metric in order to detect communities is the modularity, though many disadvantages of this parameter have already been noticed in the literature. In this work, we present a new approach based on a different metric: the surprise. Moreover, the biases of different community detection algorithms and benchmark networks are thoroughly studied, identified and commented about.
Spectral analysis has been successfully applied at the detection of community structure of networks, respectively being based on the adjacency matrix, the standard Laplacian matrix, the normalized Laplacian matrix, the modularity matrix, the correlation matrix and several other variants of these matrices. However, the comparison between these spectral methods is less reported. More importantly, it is still unclear which matrix is more appropriate for the detection of community structure. This paper answers the question through evaluating the effectiveness of these five matrices against the benchmark networks with heterogeneous distributions of node degree and community size. Test results demonstrate that the normalized Laplacian matrix and the correlation matrix significantly outperform the other three matrices at identifying the community structure of networks. This indicates that it is crucial to take into account the heterogeneous distribution of node degree when using spectral analysis for the detection of community structure. In addition, to our surprise, the modularity matrix exhibits very similar performance to the adjacency matrix, which indicates that the modularity matrix does not gain desired benefits from using the configuration model as reference network with the consideration of the node degree heterogeneity.
A distinguishing property of communities in networks is that cycles are more prevalent within communities than across communities. Thus, the detection of these communities may be aided through the incorporation of measures of the local richness of the cyclic structure. In this paper, we introduce renewal non-backtracking random walks (RNBRW) as a way of quantifying this structure. RNBRW gives a weight to each edge equal to the probability that a non-backtracking random walk completes a cycle with that edge. Hence, edges with larger weights may be thought of as more important to the formation of cycles. Of note, since separate random walks can be performed in parallel, RNBRW weights can be estimated very quickly, even for large graphs. We give simulation results showing that pre-weighting edges through RNBRW may substantially improve the performance of common community detection algorithms. Our results suggest that RNBRW is especially efficient for the challenging case of detecting communities in sparse graphs.
A distinguishing property of communities in networks is that cycles are more prevalent within communities than across communities. Thus, the detection of these communities may be aided through the incorporation of measures of the local richness of the cyclic structure. In this paper, we introduce renewal non-backtracking random walks (RNBRW) as a way of quantifying this structure. RNBRW gives a weight to each edge equal to the probability that a non-backtracking random walk completes a cycle with that edge. Hence, edges with larger weights may be thought of as more important to the formation of cycles. Of note, since separate random walks can be performed in parallel, RNBRW weights can be estimated very quickly, even for large graphs. We give simulation results showing that pre-weighting edges through RNBRW may substantially improve the performance of common community detection algorithms. Our results suggest that RNBRW is especially efficient for the challenging case of detecting communities in sparse graphs.
In this paper we propose network methodology to infer prognostic cancer biomarkers based on the epigenetic pattern DNA methylation. Epigenetic processes such as DNA methylation reflect environmental risk factors, and are increasingly recognised for their fundamental role in diseases such as cancer. DNA methylation is a gene-regulatory pattern, and hence provides a means by which to assess genomic regulatory interactions. Network models are a natural way to represent and analyse groups of such interactions. The utility of network models also increases as the quantity of data and number of variables increase, making them increasingly relevant to large-scale genomic studies. We propose methodology to infer prognostic genomic networks from a DNA methylation-based measure of genomic interaction and association. We then show how to identify prognostic biomarkers from such networks, which we term `network community oncomarkers. We illustrate the power of our proposed methodology in the context of a large publicly available breast cancer dataset.
Network growth as described by the Duplication-Divergence model proposes a simple general idea for the evolution dynamics of natural networks. In particular it is an alternative to the well known Barabasi-Albert model when applied to protein-protein interaction networks. In this work we derive a master equation for the node degree distribution of networks growing via Duplication and Divergence and we obtain an expression for the total number of links and for the degree distribution as a function of the number of nodes. Using algebra tools we investigate the degree distribution asymptotic behavior. Analytic results show that the network nodes average degree converges if the total mutation rate is greater than 0.5 and diverges otherwise. Treating original and duplicated node mutation rates as independent parameters has no effect on this result. However, difference in these parameters results in a slower rate of convergence and in different degree distributions. The more different these parameters are, the denser the tail of the distribution. We compare the solutions obtained with simulated networks. These results are in good agreement with the expected values from the derived expressions. The method developed is a robust tool to investigate other models for network growing dynamics.