No Arabic abstract
Link prediction is a paradigmatic problem in network science with a variety of applications. In latent space network models this problem boils down to ranking pairs of nodes in the order of increasing latent distances between them. The network model with hyperbolic latent spaces has a number of attractive properties suggesting it must be a powerful tool to predict links, but the past work in this direction reported mixed results. Here we perform systematic investigation of the utility of latent hyperbolic geometry for link prediction in networks. We first show that some measures of link prediction accuracy are extremely sensitive with respect to inaccuracies in the inference of latent hyperbolic coordinates of nodes, so that we develop a new coordinate inference method that maximizes the accuracy of such inference. Applying this method to synthetic and real networks, we then find that while there exists a multitude of competitive methods to predict obvious easy-to-predict links, among which hyperbolic link prediction is rarely the best but often competitive, it is the best, often by far, when the task is to predict less obvious missing links that are really hard to predict. These links include missing links in incomplete networks with large fractions of missing links, missing links between nodes that do not have any common neighbors, and missing links between dissimilar nodes at large latent distances. Overall these results suggest that the harder a specific link prediction task is, the more seriously one should consider using hyperbolic geometry.
Inspired by traditional link prediction and to solve the problem of recommending friends in social networks, we introduce the personalized link prediction in this paper, in which each individual will get equal number of diversiform predictions. While the performances of many classical algorithms are not satisfactory under this framework, thus new algorithms are in urgent need. Motivated by previous researches in other fields, we generalize heat conduction process to the framework of personalized link prediction and find that this method outperforms many classical similarity-based algorithms, especially in the performance of diversity. In addition, we demonstrate that adding one ground node who is supposed to connect all the nodes in the system will greatly benefit the performance of heat conduction. Finally, better hybrid algorithms composed of local random walk and heat conduction have been proposed. Numerical results show that the hybrid algorithms can outperform other algorithms simultaneously in all four adopted metrics: AUC, precision, recall and hamming distance. In a word, this work may shed some light on the in-depth understanding of the effect of physical processes in personalized link prediction.
Many real world, complex phenomena have underlying structures of evolving networks where nodes and links are added and removed over time. A central scientific challenge is the description and explanation of network dynamics, with a key test being the prediction of short and long term changes. For the problem of short-term link prediction, existing methods attempt to determine neighborhood metrics that correlate with the appearance of a link in the next observation period. Recent work has suggested that the incorporation of topological features and node attributes can improve link prediction. We provide an approach to predicting future links by applying the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to optimize weights which are used in a linear combination of sixteen neighborhood and node similarity indices. We examine a large dynamic social network with over $10^6$ nodes (Twitter reciprocal reply networks), both as a test of our general method and as a problem of scientific interest in itself. Our method exhibits fast convergence and high levels of precision for the top twenty predicted links. Based on our findings, we suggest possible factors which may be driving the evolution of Twitter reciprocal reply networks.
Online social network (OSN) applications provide different experiences; for example, posting a short text on Twitter and sharing photographs on Instagram. Multiple OSNs constitute a multiplex network. For privacy protection and usage purposes, accounts belonging to the same user in different OSNs may have different usernames, photographs, and introductions. Interlayer link prediction in multiplex network aims at identifying whether the accounts in different OSNs belong to the same person, which can aid in tasks including cybercriminal behavior modeling and customer interest analysis. Many real-world OSNs exhibit a scale-free degree distribution; thus, neighbors with different degrees may exert different influences on the node matching degrees across different OSNs. We developed an iterative degree penalty (IDP) algorithm for interlayer link prediction in the multiplex network. First, we proposed a degree penalty principle that assigns a greater weight to a common matched neighbor with fewer connections. Second, we applied node adjacency matrix multiplication for efficiently obtaining the matching degree of all unmatched node pairs. Thereafter, we used the approved maximum value method to obtain the interlayer link prediction results from the matching degree matrix. Finally, the prediction results were inserted into the priori interlayer node pair set and the above processes were performed iteratively until all unmatched nodes in one layer were matched or all matching degrees of the unmatched node pairs were equal to 0. Experiments demonstrated that our advanced IDP algorithm significantly outperforms current network structure-based methods when the multiplex network average degree and node overlapping rate are low.
Many real-world complex systems are well represented as multilayer networks; predicting interactions in those systems is one of the most pressing problems in predictive network science. To address this challenge, we introduce two stochastic block models for multilayer and temporal networks; one of them uses nodes as its fundamental unit, whereas the other focuses on links. We also develop scalable algorithms for inferring the parameters of these models. Because our models describe all layers simultaneously, our approach takes full advantage of the information contained in the whole network when making predictions about any particular layer. We illustrate the potential of our approach by analyzing two empirical datasets---a temporal network of email communications, and a network of drug interactions for treating different cancer types. We find that modeling all layers simultaneously does result, in general, in more accurate link prediction. However, the most predictive model depends on the dataset under consideration; whereas the node-based model is more appropriate for predicting drug interactions, the link-based model is more appropriate for predicting email communication.
Online users are typically active on multiple social media networks (SMNs), which constitute a multiplex social network. It is becoming increasingly challenging to determine whether given accounts on different SMNs belong to the same user; this can be expressed as an interlayer link prediction problem in a multiplex network. To address the challenge of predicting interlayer links , feature or structure information is leveraged. Existing methods that use network embedding techniques to address this problem focus on learning a mapping function to unify all nodes into a common latent representation space for prediction; positional relationships between unmatched nodes and their common matched neighbors (CMNs) are not utilized. Furthermore, the layers are often modeled as unweighted graphs, ignoring the strengths of the relationships between nodes. To address these limitations, we propose a framework based on multiple types of consistency between embedding vectors (MulCEV). In MulCEV, the traditional embedding-based method is applied to obtain the degree of consistency between the vectors representing the unmatched nodes, and a proposed distance consistency index based on the positions of nodes in each latent space provides additional clues for prediction. By associating these two types of consistency, the effective information in the latent spaces is fully utilized. Additionally, MulCEV models the layers as weighted graphs to obtain better representation. In this way, the higher the strength of the relationship between nodes, the more similar their embedding vectors in the latent representation space will be. The results of our experiments on several real-world datasets demonstrate that the proposed MulCEV framework markedly outperforms current embedding-based methods, especially when the number of training iterations is small.