No Arabic abstract
Community detection, aiming to group nodes based on their connections, plays an important role in network analysis, since communities, treated as meta-nodes, allow us to create a large-scale map of a network to simplify its analysis. However, for privacy reasons, we may want to prevent communities from being discovered in certain cases, leading to the topics on community deception. In this paper, we formalize this community detection attack problem in three scales, including global attack (macroscale), target community attack (mesoscale) and target node attack (microscale). We treat this as an optimization problem and further propose a novel Evolutionary Perturbation Attack (EPA) method, where we generate adversarial networks to realize the community detection attack. Numerical experiments validate that our EPA can successfully attack network community algorithms in all three scales, i.e., hide target nodes or communities and further disturb the community structure of the whole network by only changing a small fraction of links. By comparison, our EPA behaves better than a number of baseline attack methods on six synthetic networks and three real-world networks. More interestingly, although our EPA is based on the louvain algorithm, it is also effective on attacking other community detection algorithms, validating its good transferability.
Community detection plays an important role in social networks, since it can help to naturally divide the network into smaller parts so as to simplify network analysis. However, on the other hand, it arises the concern that individual information may be over-mined, and the concept community deception thus is proposed to protect individual privacy on social networks. Here, we introduce and formalize the problem of community detection attack and develop efficient strategies to attack community detection algorithms by rewiring a small number of connections, leading to individual privacy protection. In particular, we first give two heuristic attack strategies, i.e., Community Detection Attack (CDA) and Degree Based Attack (DBA), as baselines, utilizing the information of detected community structure and node degree, respectively. And then we propose a Genetic Algorithm (GA) based Q-Attack, where the modularity Q is used to design the fitness function. We launch community detection attack based on the above three strategies against three modularity based community detection algorithms on two social networks. By comparison, our Q-Attack method achieves much better attack effects than CDA and DBA, in terms of the larger reduction of both modularity Q and Normalized Mutual Information (NMI). Besides, we find that the adversarial networks obtained by Q-Attack on a specific community detection algorithm can be still effective on others, no matter whether they are modularity based or not, indicating its strong transferability.
We apply spectral clustering and multislice modularity optimization to a Los Angeles Police Department field interview card data set. To detect communities (i.e., cohesive groups of vertices), we use both geographic and social information about stops involving street gang members in the LAPD district of Hollenbeck. We then compare the algorithmically detected communities with known gang identifications and argue that discrepancies are due to sparsity of social connections in the data as well as complex underlying sociological factors that blur distinctions between communities.
Community structures are critical towards understanding not only the network topology but also how the network functions. However, how to evaluate the quality of detected community structures is still challenging and remains unsolved. The most widely used metric, normalized mutual information (NMI), was proved to have finite size effect, and its improved form relative normalized mutual information (rNMI) has reverse finite size effect. Corrected normalized mutual information (cNMI) was thus proposed and has neither finite size effect nor reverse finite size effect. However, in this paper we show that cNMI violates the so-called proportionality assumption. In addition, NMI-type metrics have the problem of ignoring importance of small communities. Finally, they cannot be used to evaluate a single community of interest. In this paper, we map the computed community labels to the ground-truth ones through integer linear programming, then use kappa index and F-score to evaluate the detected community structures. Experimental results demonstrate the advantages of our method.
We introduce a new paradigm that is important for community detection in the realm of network analysis. Networks contain a set of strong, dominant communities, which interfere with the detection of weak, natural community structure. When most of the members of the weak communities also belong to stronger communities, they are extremely hard to be uncovered. We call the weak communities the hidden community structure. We present a novel approach called HICODE (HIdden COmmunity DEtection) that identifies the hidden community structure as well as the dominant community structure. By weakening the strength of the dominant structure, one can uncover the hidden structure beneath. Likewise, by reducing the strength of the hidden structure, one can more accurately identify the dominant structure. In this way, HICODE tackles both tasks simultaneously. Extensive experiments on real-world networks demonstrate that HICODE outperforms several state-of-the-art community detection methods in uncovering both the dominant and the hidden structure. In the Facebook university social networks, we find multiple non-redundant sets of communities that are strongly associated with residential hall, year of registration or career position of the faculties or students, while the state-of-the-art algorithms mainly locate the dominant ground truth category. In the Due to the difficulty of labeling all ground truth communities in real-world datasets, HICODE provides a promising approach to pinpoint the existing latent communities and uncover communities for which there is no ground truth. Finding this unknown structure is an extremely important community detection problem.
Hidden community is a new graph-theoretical concept recently proposed [4], in which the authors also propose a meta-approach called HICODE (Hidden Community Detection) for detecting hidden communities. HICODE is demonstrated through experiments that it is able to uncover previously overshadowed weak layers and uncover both weak and strong layers at a higher accuracy. However, the authors provide no theoretical guarantee for the performance. In this work, we focus on the theoretical analysis of HICODE on synthetic two-layer networks, where layers are independent of each other and each layer is generated by stochastic block model. We bridge their gap through two-layer stochastic block model networks in the following aspects: 1) we show that partitions that locally optimize modularity correspond to grounded layers, indicating modularity-optimizing algorithms can detect strong layers; 2) we prove that when reducing found layers, HICODE increases absolute modularities of all unreduced layers, showing its layer reduction step makes weak layers more detectable. Our work builds a solid theoretical base for HICODE, demonstrating that it is promising in uncovering both weak and strong layers of communities in two-layer networks.