No Arabic abstract
Seeding then expanding is a commonly used scheme to discover overlapping communities in a network. Most seeding methods are either too complex to scale to large networks or too simple to select high-quality seeds, and the non-principled functions used by most expanding methods lead to poor performance when applied to diverse networks. This paper proposes a new method that transforms a network into a corpus where each edge is treated as a document, and all nodes of the network are treated as terms of the corpus. An effective seeding method is also proposed that selects seeds as a training set, then a principled expanding method based on semi-supervised learning is applied to classify edges. We compare our new algorithm with four other community detection algorithms on a wide range of synthetic and empirical networks. Experimental results show that the new algorithm can significantly improve clustering performance in most cases. Furthermore, the time complexity of the new algorithm is linear to the number of edges, and this low complexity makes the new algorithm scalable to large networks.
Community structure is a typical property of many real-world networks, and has become a key to understand the dynamics of the networked systems. In these networks most nodes apparently lie in a community while there often exists a few nodes straddling several communities. An ideal algorithm for community detection is preferable which can identify the overlapping communities in such networks. To represent an overlapping division we develop a encoding schema composed of two segments, the first one represents a disjoint partition and the second one represents a extension of the partition that allows of multiple memberships. We give a measure for the informativeness of a node, and present an evolutionary method for detecting the overlapping communities in a network.
In network science, assortativity refers to the tendency of links to exist between nodes with similar attributes. In social networks, for example, links tend to exist between individuals of similar age, nationality, location, race, income, educational level, religious belief, and language. Thus, various attributes jointly affect the network topology. An interesting problem is to detect community structure beyond some specific assortativity-related attributes $rho$, i.e., to take out the effect of $rho$ on network topology and reveal the hidden community structure which are due to other attributes. An approach to this problem is to redefine the null model of the modularity measure, so as to simulate the effect of $rho$ on network topology. However, a challenge is that we do not know to what extent the network topology is affected by $rho$ and by other attributes. In this paper, we propose Dist-Modularity which allows us to freely choose any suitable function to simulate the effect of $rho$. Such freedom can help us probe the effect of $rho$ and detect the hidden communities which are due to other attributes. We test the effectiveness of Dist-Modularity on synthetic benchmarks and two real-world networks.
Degree distribution of nodes, especially a power law degree distribution, has been regarded as one of the most significant structural characteristics of social and information networks. Node degree, however, only discloses the first-order structure of a network. Higher-order structures such as the edge embeddedness and the size of communities may play more important roles in many online social networks. In this paper, we provide empirical evidence on the existence of rich higherorder structural characteristics in online social networks, develop mathematical models to interpret and model these characteristics, and discuss their various applications in practice. In particular, 1) We show that the embeddedness distribution of social links in many social networks has interesting and rich behavior that cannot be captured by well-known network models. We also provide empirical results showing a clear correlation between the embeddedness distribution and the average number of messages communicated between pairs of social network nodes. 2) We formally prove that random k-tree, a recent model for complex networks, has a power law embeddedness distribution, and show empirically that the random k-tree model can be used to capture the rich behavior of higherorder structures we observed in real-world social networks. 3) Going beyond the embeddedness, we show that a variant of the random k-tree model can be used to capture the power law distribution of the size of communities of overlapping cliques discovered recently.
Hypergraph-based machine learning methods are now widely recognized as important for modeling and using higher-order and multiway relationships between data objects. Local hypergraph clustering and semi-supervised learning specifically involve finding a well-connected set of nodes near a given set of labeled vertices. Although many methods for local clustering exist for graphs, there are relatively few for localized clustering in hypergraphs. Moreover, those that exist often lack flexibility to model a general class of hypergraph cut functions or cannot scale to large problems. To tackle these issues, this paper proposes a new diffusion-based hypergraph clustering algorithm that solves a quadratic hypergraph cut based objective akin to a hypergraph analog of Andersen-Chung-Lang personalized PageRank clustering for graphs. We prove that, for graphs with fixed maximum hyperedge size, this method is strongly local, meaning that its runtime only depends on the size of the output instead of the size of the hypergraph and is highly scalable. Moreover, our method enables us to compute with a wide variety of cardinality-based hypergraph cut functions. We also prove that the clusters found by solving the new objective function satisfy a Cheeger-like quality guarantee. We demonstrate that on large real-world hypergraphs our new method finds better clusters and runs much faster than existing approaches. Specifically, it runs in few seconds for hypergraphs with a few million hyperedges compared with minutes for flow-based technique. We furthermore show that our framework is general enough that can also be used to solve other p-norm based cut objectives on hypergraphs. Our code is available url{github.com/MengLiuPurdue/LHQD}.
The conventional notion of community that favors a high ratio of internal edges to outbound edges becomes invalid when each vertex participates in multiple communities. Such a behavior is commonplace in social networks. The significant overlaps among communities make most existing community detection algorithms ineffective. The lack of effective and efficient tools resulted in very few empirical studies on large-scale detection and analyses of overlapping community structure in real social networks. We developed recently a scalable and accurate method called the Partial Community Merger Algorithm (PCMA) with linear complexity and demonstrated its effectiveness by analyzing two online social networks, Sina Weibo and Friendster, with 79.4 and 65.6 million vertices, respectively. Here, we report in-depth analyses of the 2.9 million communities detected by PCMA to uncover their complex overlapping structure. Each community usually overlaps with a significant number of other communities and has far more outbound edges than internal edges. Yet, the communities remain well separated from each other. Most vertices in a community are multi-membership vertices, and they can be at the core or the peripheral. Almost half of the entire network can be accounted for by an extremely dense network of communities, with the communities being the vertices and the overlaps being the edges. The empirical findings ask for rethinking the notion of community, especially the boundary of a community. Realizing that it is how the edges are organized that matters, the f-core is suggested as a suitable concept for overlapping community in social networks. The results shed new light on the understanding of overlapping community.