No Arabic abstract
Link and sign prediction in complex networks bring great help to decision-making and recommender systems, such as in predicting potential relationships or relative status levels. Many previous studies focused on designing the special algorithms to perform either link prediction or sign prediction. In this work, we propose an effective model integration algorithm consisting of network embedding, network feature engineering, and an integrated classifier, which can perform the link and sign prediction in the same framework. Network embedding can accurately represent the characteristics of topological structures and cooperate with the powerful network feature engineering and integrated classifier can achieve better prediction. Experiments on several datasets show that the proposed model can achieve state-of-the-art or competitive performance for both link and sign prediction in spite of its generality. Interestingly, we find that using only very low network embedding dimension can generate high prediction performance, which can significantly reduce the computational overhead during training and prediction. This study offers a powerful methodology for multi-task prediction in complex networks.
We propose here two new recommendation methods, based on the appropriate normalization of already existing similarity measures, and on the convex combination of the recommendation scores derived from similarity between users and between objects. We validate the proposed measures on three relevant data sets, and we compare their performance with several recommendation systems recently proposed in the literature. We show that the proposed similarity measures allow to attain an improvement of performances of up to 20% with respect to existing non-parametric methods, and that the accuracy of a recommendation can vary widely from one specific bipartite network to another, which suggests that a careful choice of the most suitable method is highly relevant for an effective recommendation on a given system. Finally, we studied how an increasing presence of random links in the network affects the recommendation scores, and we found that one of the two recommendation algorithms introduced here can systematically outperform the others in noisy data sets.
Most real-world networks are incompletely observed. Algorithms that can accurately predict which links are missing can dramatically speedup the collection of network data and improve the validity of network models. Many algorithms now exist for predicting missing links, given a partially observed network, but it has remained unknown whether a single best predictor exists, how link predictability varies across methods and networks from different domains, and how close to optimality current methods are. We answer these questions by systematically evaluating 203 individual link predictor algorithms, representing three popular families of methods, applied to a large corpus of 548 structurally diverse networks from six scientific domains. We first show that individual algorithms exhibit a broad diversity of prediction errors, such that no one predictor or family is best, or worst, across all realistic inputs. We then exploit this diversity via meta-learning to construct a series of stacked models that combine predictors into a single algorithm. Applied to a broad range of synthetic networks, for which we may analytically calculate optimal performance, these stacked models achieve optimal or nearly optimal levels of accuracy. Applied to real-world networks, stacked models are also superior, but their accuracy varies strongly by domain, suggesting that link prediction may be fundamentally easier in social networks than in biological or technological networks. These results indicate that the state-of-the-art for link prediction comes from combining individual algorithms, which achieves nearly optimal predictions. We close with a brief discussion of limitations and opportunities for further improvement of these results.
Many real world, complex phenomena have underlying structures of evolving networks where nodes and links are added and removed over time. A central scientific challenge is the description and explanation of network dynamics, with a key test being the prediction of short and long term changes. For the problem of short-term link prediction, existing methods attempt to determine neighborhood metrics that correlate with the appearance of a link in the next observation period. Recent work has suggested that the incorporation of topological features and node attributes can improve link prediction. We provide an approach to predicting future links by applying the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to optimize weights which are used in a linear combination of sixteen neighborhood and node similarity indices. We examine a large dynamic social network with over $10^6$ nodes (Twitter reciprocal reply networks), both as a test of our general method and as a problem of scientific interest in itself. Our method exhibits fast convergence and high levels of precision for the top twenty predicted links. Based on our findings, we suggest possible factors which may be driving the evolution of Twitter reciprocal reply networks.
Online social network (OSN) applications provide different experiences; for example, posting a short text on Twitter and sharing photographs on Instagram. Multiple OSNs constitute a multiplex network. For privacy protection and usage purposes, accounts belonging to the same user in different OSNs may have different usernames, photographs, and introductions. Interlayer link prediction in multiplex network aims at identifying whether the accounts in different OSNs belong to the same person, which can aid in tasks including cybercriminal behavior modeling and customer interest analysis. Many real-world OSNs exhibit a scale-free degree distribution; thus, neighbors with different degrees may exert different influences on the node matching degrees across different OSNs. We developed an iterative degree penalty (IDP) algorithm for interlayer link prediction in the multiplex network. First, we proposed a degree penalty principle that assigns a greater weight to a common matched neighbor with fewer connections. Second, we applied node adjacency matrix multiplication for efficiently obtaining the matching degree of all unmatched node pairs. Thereafter, we used the approved maximum value method to obtain the interlayer link prediction results from the matching degree matrix. Finally, the prediction results were inserted into the priori interlayer node pair set and the above processes were performed iteratively until all unmatched nodes in one layer were matched or all matching degrees of the unmatched node pairs were equal to 0. Experiments demonstrated that our advanced IDP algorithm significantly outperforms current network structure-based methods when the multiplex network average degree and node overlapping rate are low.
Inspired by traditional link prediction and to solve the problem of recommending friends in social networks, we introduce the personalized link prediction in this paper, in which each individual will get equal number of diversiform predictions. While the performances of many classical algorithms are not satisfactory under this framework, thus new algorithms are in urgent need. Motivated by previous researches in other fields, we generalize heat conduction process to the framework of personalized link prediction and find that this method outperforms many classical similarity-based algorithms, especially in the performance of diversity. In addition, we demonstrate that adding one ground node who is supposed to connect all the nodes in the system will greatly benefit the performance of heat conduction. Finally, better hybrid algorithms composed of local random walk and heat conduction have been proposed. Numerical results show that the hybrid algorithms can outperform other algorithms simultaneously in all four adopted metrics: AUC, precision, recall and hamming distance. In a word, this work may shed some light on the in-depth understanding of the effect of physical processes in personalized link prediction.