No Arabic abstract
As a fundamental challenge in vast disciplines, link prediction aims to identify potential links in a network based on the incomplete observed information, which has broad applications ranging from uncovering missing protein-protein interaction to predicting the evolution of networks. One of the most influential methods rely on similarity indices characterized by the common neighbors or its variations. We construct a hidden space mapping a network into Euclidean space based solely on the connection structures of a network. Compared with real geographical locations of nodes, our reconstructed locations are in conformity with those real ones. The distances between nodes in our hidden space could serve as a novel similarity metric in link prediction. In addition, we hybrid our hidden space method with other state-of-the-art similarity methods which substantially outperforms the existing methods on the prediction accuracy. Hence, our hidden space reconstruction model provides a fresh perspective to understand the network structure, which in particular casts a new light on link prediction.
Recent progress towards unraveling the hidden geometric organization of real multiplexes revealed significant correlations across the hyperbolic node coordinates in different network layers, which facilitated applications like trans-layer link prediction and mutual navigation. But are geometric correlations alone sufficient to explain the topological relation between the layers of real systems? Here we provide the negative answer to this question. We show that connections in real systems tend to persist from one layer to another irrespectively of their hyperbolic distances. This suggests that in addition to purely geometric aspects the explicit link formation process in one layer impacts the topology of other layers. Based on this finding, we present a simple modification to the recently developed Geometric Multiplex Model to account for this effect, and show that the extended model can reproduce the behavior observed in real systems. We also find that link persistence is significant in all considered multiplexes and can explain their layers high edge overlap, which cannot be explained by coordinate correlations alone. Furthermore, by taking both link persistence and hyperbolic distance correlations into account we can improve trans-layer link prediction. These findings guide the development of multiplex embedding methods, suggesting that such methods should be accounting for both coordinate correlations and link persistence across layers.
Community detection and link prediction are both of great significance in network analysis, which provide very valuable insights into topological structures of the network from different perspectives. In this paper, we propose a novel community detection algorithm with inclusion of link prediction, motivated by the question whether link prediction can be devoted to improving the accuracy of community partition. For link prediction, we propose two novel indices to compute the similarity between each pair of nodes, one of which aims to add missing links, and the other tries to remove spurious edges. Extensive experiments are conducted on benchmark data sets, and the results of our proposed algorithm are compared with two classes of baseline. In conclusion, our proposed algorithm is competitive, revealing that link prediction does improve the precision of community detection.
Many real networks that are inferred or collected from data are incomplete due to missing edges. Missing edges can be inherent to the dataset (Facebook friend links will never be complete) or the result of sampling (one may only have access to a portion of the data). The consequence is that downstream analyses that consume the network will often yield less accurate results than if the edges were complete. Community detection algorithms, in particular, often suffer when critical intra-community edges are missing. We propose a novel consensus clustering algorithm to enhance community detection on incomplete networks. Our framework utilizes existing community detection algorithms that process networks imputed by our link prediction based algorithm. The framework then merges their multiple outputs into a final consensus output. On average our method boosts performance of existing algorithms by 7% on artificial data and 17% on ego networks collected from Facebook.
Bipartite networks are a common type of network data in which there are two types of vertices, and only vertices of different types can be connected. While bipartite networks exhibit community structure like their unipartite counterparts, existing approaches to bipartite community detection have drawbacks, including implicit parameter choices, loss of information through one-mode projections, and lack of interpretability. Here we solve the community detection problem for bipartite networks by formulating a bipartite stochastic block model, which explicitly includes vertex type information and may be trivially extended to $k$-partite networks. This bipartite stochastic block model yields a projection-free and statistically principled method for community detection that makes clear assumptions and parameter choices and yields interpretable results. We demonstrate this models ability to efficiently and accurately find community structure in synthetic bipartite networks with known structure and in real-world bipartite networks with unknown structure, and we characterize its performance in practical contexts.
Community definitions usually focus on edges, inside and between the communities. However, the high density of edges within a community determines correlations between nodes going beyond nearest-neighbours, and which are indicated by the presence of motifs. We show how motifs can be used to define general classes of nodes, including communities, by extending the mathematical expression of Newman-Girvan modularity. We construct then a general framework and apply it to some synthetic and real networks.