No Arabic abstract
Community structure is one of the most important features of real networks and reveals the internal organization of the nodes. Many algorithms have been proposed but the crucial issue of testing, i.e. the question of how good an algorithm is, with respect to others, is still open. Standard tests include the analysis of simple artificial graphs with a built-in community structure, that the algorithm has to recover. However, the special graphs adopted in actual tests have a structure that does not reflect the real properties of nodes and communities found in real networks. Here we introduce a new class of benchmark graphs, that account for the heterogeneity in the distributions of node degrees and of community sizes. We use this new benchmark to test two popular methods of community detection, modularity optimization and Potts model clustering. The results show that the new benchmark poses a much more severe test to algorithms than standard benchmarks, revealing limits that may not be apparent at a first analysis.
Community detection is a fundamental and important problem in network science, as community structures often reveal both topological and functional relationships between different components of the complex system. In this paper, we first propose a gradient descent framework of modularity optimization called vector-label propagation algorithm (VLPA), where a node is associated with a vector of continuous community labels instead of one label. Retaining weak structural information in vector-label, VLPA outperforms some well-known community detection methods, and particularly improves the performance in networks with weak community structures. Further, we incorporate stochastic gradient strategies into VLPA to avoid stuck in the local optima, leading to the stochastic vector-label propagation algorithm (sVLPA). We show that sVLPA performs better than Louvain Method, a widely used community detection algorithm, on both artificial benchmarks and real-world networks. Our theoretical scheme based on vector-label propagation can be directly applied to high-dimensional networks where each node has multiple features, and can also be used for optimizing other partition measures such as modularity with resolution parameters.
Graph vertices are often organized into groups that seem to live fairly independently of the rest of the graph, with which they share but a few edges, whereas the relationships between group members are stronger, as shown by the large number of mutual connections. Such groups of vertices, or communities, can be considered as independent compartments of a graph. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. The task is very hard, though, both conceptually, due to the ambiguity in the definition of community and in the discrimination of different partitions and practically, because algorithms must find ``good partitions among an exponentially large number of them. Other complications are represented by the possible occurrence of hierarchies, i.e. communities which are nested inside larger communities, and by the existence of overlaps between communities, due to the presence of nodes belonging to more groups. All these aspects are dealt with in some detail and many methods are described, from traditional approaches used in computer science and sociology to recent techniques developed mostly within statistical physics.
Research into detection of dense communities has recently attracted increasing attention within network science, various metrics for detection of such communities have been proposed. The most popular metric -- Modularity -- is based on the so-called rule that the links within communities are denser than external links among communities, has become the default. However, this default metric suffers from ambiguity, and worse, all augmentations of modularity and based on a narrow intuition of what it means to form a community. We argue that in specific, but quite common systems, links within a community are not necessarily more common than links between communities. Instead we propose that the defining characteristic of a community is that links are more predictable within a community rather than between communities. In this paper, based on the effect of communities on link prediction, we propose a novel metric for the community detection based directly on this feature. We find that our metric is more robustness than traditional modularity. Consequently, we can achieve an evaluation of algorithm stability for the same detection algorithm in different networks. Our metric also can directly uncover the false community detection, and infer more statistical characteristics for detection algorithms.
Many systems exhibit complex temporal dynamics due to the presence of different processes taking place simultaneously. Temporal networks provide a framework to describe the time-resolve interactions between components of a system. An important task when investigating such systems is to extract a simplified view of the temporal network, which can be done via dynamic community detection or clustering. Several works have generalized existing community detection methods for static networks to temporal networks, but they usually rely on temporal aggregation over time windows, the assumption of an underlying stationary process, or sequences of different stationary epochs. Here, we derive a method based on a dynamical process evolving on the temporal network and restricted by its activation pattern that allows to consider the full temporal information of the system. Our method allows dynamics that do not necessarily reach a steady state, or follow a sequence of stationary states. Our framework encompasses several well-known heuristics as special cases. We show that our method provides a natural way to disentangle the different natural dynamical scales present in a system. We demonstrate our method abilities on synthetic and real-world examples.
Time-stamped data are increasingly available for many social, economic, and information systems that can be represented as networks growing with time. The World Wide Web, social contact networks, and citation networks of scientific papers and online news articles, for example, are of this kind. Static methods can be inadequate for the analysis of growing networks as they miss essential information on the systems dynamics. At the same time, time-aware methods require the choice of an observation timescale, yet we lack principled ways to determine it. We focus on the popular community detection problem which aims to partition a networks nodes into meaningful groups. We use a multi-layer quality function to show, on both synthetic and real datasets, that the observation timescale that leads to optimal communities is tightly related to the systems intrinsic aging timescale that can be inferred from the time-stamped network data. The use of temporal information leads to drastically different conclusions on the community structure of real information networks, which challenges the current understanding of the large-scale organization of growing networks. Our findings indicate that before attempting to assess structural patterns of evolving networks, it is vital to uncover the timescales of the dynamical processes that generated them.