No Arabic abstract
In the era of big data, graph sampling is indispensable in many settings. Existing sampling methods are mostly designed for static graphs, and aim to preserve basic structural properties of the original graph (such as degree distribution, clustering coefficient etc.) in the sample. We argue that for any sampling method it is impossible to produce an universal representative sample which can preserve all the properties of the original graph; rather sampling should be application specific (such as preserving hubs - needed for information diffusion). Here we consider community detection as an application scenario. We propose ComPAS, a novel sampling strategy that unlike previous methods, is not only designed for streaming graphs (which is a more realistic representation of a real-world scenario) but also preserves the community structure of the original graph in the sample. Empirical results on both synthetic and different real-world graphs show that ComPAS is the best to preserve the underlying community structure with average performance reaching 73.2% of the most informed algorithm for static graphs.
Network science is a powerful tool for analyzing complex systems in fields ranging from sociology to engineering to biology. This paper is focused on generative models of large-scale bipartite graphs, also known as two-way graphs or two-mode networks. We propose two generative models that can be easily tuned to reproduce the characteristics of real-world networks, not just qualitatively, but quantitatively. The characteristics we consider are the degree distributions and the metamorphosis coefficient. The metamorphosis coefficient, a bipartite analogue of the clustering coefficient, is the proportion of length-three paths that participate in length-four cycles. Having a high metamorphosis coefficient is a necessary condition for close-knit community structure. We define edge, node, and degreewise metamorphosis coefficients, enabling a more detailed understanding of the bipartite connectivity that is not explained by degree distribution alone. Our first model, bipartite Chung-Lu (CL), is able to reproduce real-world degree distributions, and our second model, bipartite block two-level Erdos-Renyi (BTER), reproduces both the degree distributions as well as the degreewise metamorphosis coefficients. We demonstrate the effectiveness of these models on several real-world data sets.
Understanding the network structure, and finding out the influential nodes is a challenging issue in the large networks. Identifying the most influential nodes in the network can be useful in many applications like immunization of nodes in case of epidemic spreading, during intentional attacks on complex networks. A lot of research is done to devise centrality measures which could efficiently identify the most influential nodes in the network. There are two major approaches to the problem: On one hand, deterministic strategies that exploit knowledge about the overall network topology in order to find the influential nodes, while on the other end, random strategies are completely agnostic about the network structure. Centrality measures that can deal with a limited knowledge of the network structure are required. Indeed, in practice, information about the global structure of the overall network is rarely available or hard to acquire. Even if available, the structure of the network might be too large that it is too much computationally expensive to calculate global centrality measures. To that end, a centrality measure is proposed that requires information only at the community level to identify the influential nodes in the network. Indeed, most of the real-world networks exhibit a community structure that can be exploited efficiently to discover the influential nodes. We performed a comparative evaluation of prominent global deterministic strategies together with stochastic strategies with an available and the proposed deterministic community-based strategy. Effectiveness of the proposed method is evaluated by performing experiments on synthetic and real-world networks with community structure in the case of immunization of nodes for epidemic control.
Understanding the epidemic dynamics, and finding out efficient techniques to control it, is a challenging issue. A lot of research has been done on targeted immunization strategies, exploiting various global network topological properties. However, in practice, information about the global structure of the contact network may not be available. Therefore, immunization strategies that can deal with a limited knowledge of the network structure are required. In this paper, we propose targeted immunization strategies that require information only at the community level. Results of our investigations on the SIR epidemiological model, using a realistic synthetic benchmark with controlled community structure, show that the community structure plays an important role in the epidemic dynamics. An extensive comparative evaluation demonstrates that the proposed strategies are as efficient as the most influential global centrality based immunization strategies, despite the fact that they use a limited amount of information. Furthermore, they outperform alternative local strategies, which are agnostic about the network structure, and make decisions based on random walks.
The information theoretic quantity known as mutual information finds wide use in classification and community detection analyses to compare two classifications of the same set of objects into groups. In the context of classification algorithms, for instance, it is often used to compare discovered classes to known ground truth and hence to quantify algorithm performance. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We demonstrate how to correct this error and define a mutual information that works in all cases. We discuss practical implementation of the new measure and give some example applications.
Neural node embeddings have recently emerged as a powerful representation for supervised learning tasks involving graph-structured data. We leverage this recent advance to develop a novel algorithm for unsupervised community discovery in graphs. Through extensive experimental studies on simulated and real-world data, we demonstrate that the proposed approach consistently improves over the current state-of-the-art. Specifically, our approach empirically attains the information-theoretic limits for community recovery under the benchmark Stochastic Block Models for graph generation and exhibits better stability and accuracy over both Spectral Clustering and Acyclic Belief Propagation in the community recovery limits.