No Arabic abstract
Given a graph with millions of nodes, what patterns exist in the distributions of node characteristics, and how can we detect them and separate anomalous nodes in a way similar to human vision? In this paper, we propose a vision-guided algorithm, EagleMine, to summarize micro-cluster patterns in two-dimensional histogram plots constructed from node features in a large graph. EagleMine utilizes a water-level tree to capture cluster structures according to vision-based intuition at multi-resolutions. EagleMine traverses the water-level tree from the root and adopts statistical hypothesis tests to determine the optimal clusters that should be fitted along the path, and summarizes each cluster with a truncated Gaussian distribution. Experiments on real data show that our method can find truncated and overlapped elliptical clusters, even when some baseline methods split one visual cluster into pieces with Gaussian spheres. To identify potentially anomalous microclusters, EagleMine also a designates score to measure the suspiciousness of outlier groups (i.e. node clusters) and outlier nodes, detecting bots and anomalous users with high accuracy in the real Microblog data.
Temporal graphs are ubiquitous. Mining communities that are bursting in a period of time is essential to seek emergency events in temporal graphs. Unfortunately, most previous studies for community mining in temporal networks ignore the bursting patterns of communities. In this paper, we are the first to study a problem of seeking bursting communities in a temporal graph. We propose a novel model, called (l, {delta})-maximal dense core, to represent a bursting community in a temporal graph. Specifically, an (l, {delta})-maximal dense core is a temporal subgraph in which each node has average degree no less than {delta} in a time segment with length no less than l. To compute the (l, {delta})-maximal dense core, we first develop a novel dynamic programming algorithm which can calculate the segment density efficiently. Then, we propose an improved algorithm with several novel pruning techniques to further improve the efficiency. In addition, we also develop an efficient algorithm to enumerate all (l, {delta})-maximal dense cores that are not dominated by the others in terms of the parameters l and {delta}. The results of extensive experiments on 9 real-life datasets demonstrate the effectiveness, efficiency and scalability of our algorithms.
We investigate social networks of characters found in cultural works such as novels and films. These character networks exhibit many of the properties of complex networks such as skewed degree distribution and community structure, but may be of relatively small order with a high multiplicity of edges. Building on recent work of beveridge, we consider graph extraction, visualization, and network statistics for three novels: Twilight by Stephanie Meyer, Steven Kings The Stand, and J.K. Rowlings Harry Potter and the Goblet of Fire. Coupling with 800 character networks from films found in the http://moviegalaxies.com/ database, we compare the data sets to simulations from various stochastic complex networks models including random graphs with given expected degrees (also known as the Chung-Lu model), the configuration model, and the preferential attachment model. Using machine learning techniques based on motif (or small subgraph) counts, we determine that the Chung-Lu model best fits character networks and we conjecture why this may be the case.
Graph partitioning problems emerge in a wide variety of complex systems, ranging from biology to finance, but can be rigorously analyzed and solved only for a few graph ensembles. Here, an ensemble of equitable graphs, i.e. random graphs with a block-regular structure, is studied, for which analytical results can be obtained. In particular, the spectral density of this ensemble is computed exactly for a modular and bipartite structure. Kesten-McKays law for random regular graphs is found analytically to apply also for modular and bipartite structures when blocks are homogeneous. Exact solution to graph partitioning for two equal-sized communities is proposed and verified numerically, and a conjecture on the absence of an efficient recovery detectability transition in equitable graphs is suggested. Final discussion summarizes results and outlines their relevance for the solution of graph partitioning problems in other graph ensembles, in particular for the study of detectability thresholds and resolution limits in stochastic block models.
Core-periphery structure is an emerging property of a wide range of complex systems and indicate the presence of group of actors in the system with an higher number of connections among them and a lower number of connections with a sparsely connected periphery. The dynamics of a complex system which is interacting on a given graph structure is strictly connected with the spectral properties of the graph itself, nevertheless it is generally extremely hard to obtain analytic results which will hold for arbitrary large systems. Recently a statistical ensemble of random graphs with a regular block structure, i.e. the ensemble of equitable graphs, has been introduced and analytic results have been derived in the computationally-hard context of graph partitioning and community detection. In this paper, we present a general analytic result for a ensemble of equitable core-periphery graphs, yielding a new explicit formula for the spectral density of networks with core-periphery structure.
In the era of big data, graph sampling is indispensable in many settings. Existing sampling methods are mostly designed for static graphs, and aim to preserve basic structural properties of the original graph (such as degree distribution, clustering coefficient etc.) in the sample. We argue that for any sampling method it is impossible to produce an universal representative sample which can preserve all the properties of the original graph; rather sampling should be application specific (such as preserving hubs - needed for information diffusion). Here we consider community detection as an application scenario. We propose ComPAS, a novel sampling strategy that unlike previous methods, is not only designed for streaming graphs (which is a more realistic representation of a real-world scenario) but also preserves the community structure of the original graph in the sample. Empirical results on both synthetic and different real-world graphs show that ComPAS is the best to preserve the underlying community structure with average performance reaching 73.2% of the most informed algorithm for static graphs.