No Arabic abstract
Graph comparison is a fundamental operation in data mining and information retrieval. Due to the combinatorial nature of graphs, it is hard to balance the expressiveness of the similarity measure and its scalability. Spectral analysis provides quintessential tools for studying the multi-scale structure of graphs and is a well-suited foundation for reasoning about differences between graphs. However, computing full spectrum of large graphs is computationally prohibitive; thus, spectral graph comparison methods often rely on rough approximation techniques with weak error guarantees. In this work, we propose SLaQ, an efficient and effective approximation technique for computing spectral distances between graphs with billions of nodes and edges. We derive the corresponding error bounds and demonstrate that accurate computation is possible in time linear in the number of graph edges. In a thorough experimental evaluation, we show that SLaQ outperforms existing methods, oftentimes by several orders of magnitude in approximation accuracy, and maintains comparable performance, allowing to compare million-scale graphs in a matter of minutes on a single machine.
Distance oracles are data structures that provide fast (possibly approximate) answers to shortest-path and distance queries in graphs. The tradeoff between the space requirements and the query time of distance oracles is of particular interest and the main focus of this paper. In FOCS01, Thorup introduced approximate distance oracles for planar graphs. He proved that, for any eps>0 and for any planar graph on n nodes, there exists a (1+eps)-approximate distance oracle using space O(n eps^{-1} log n) such that approximate distance queries can be answered in time O(1/eps). Ten years later, we give the first improvements on the space-querytime tradeoff for planar graphs. * We give the first oracle having a space-time product with subquadratic dependency on 1/eps. For space ~O(n log n) we obtain query time ~O(1/eps) (assuming polynomial edge weights). The space shows a doubly logarithmic dependency on 1/eps only. We believe that the dependency on eps may be almost optimal. * For the case of moderate edge weights (average bounded by polylog(n), which appears to be the case for many real-world road networks), we hit a sweet spot, improving upon Thorups oracle both in terms of eps and n. Our oracle uses space ~O(n log log n) and it has query time ~O(log log log n + 1/eps). (Asymptotic notation in this abstract hides low-degree polynomials in log(1/eps) and log*(n).)
We introduce the emph{idemetric} property, which formalises the idea that most nodes in a graph have similar distances between them, and which turns out to be quite standard amongst small-world network models. Modulo reasonable sparsity assumptions, we are then able to show that a strong form of idemetricity is actually equivalent to a very weak expander condition (PUMP). This provides a direct way of providing short proofs that small-world network models such as the Watts-Strogatz model are strongly idemetric (for a wide range of parameters), and also provides further evidence that being idemetric is a common property. We then consider how satisfaction of the idemetric property is relevant to algorithm design. For idemetric graphs we observe, for example, that a single breadth-first search provides a solution to the all-pairs shortest paths problem, so long as one is prepared to accept paths which are of stretch close to 2 with high probability. Since we are able to show that Kleinbergs model is idemetric, these results contrast nicely with the well known negative results of Kleinberg concerning efficient decentralised algorithms for finding short paths: for precisely the same model as Kleinbergs negative results hold, we are able to show that very efficient (and decentralised) algorithms exist if one allows for reasonable preprocessing. For deterministic distributed routing algorithms we are also able to obtain results proving that less routing information is required for idemetric graphs than the worst case in order to achieve stretch less than 3 with high probability: while $Omega(n^2)$ routing information is required in the worst case for stretch strictly less than 3 on almost all pairs, for idemetric graphs the total routing information required is $O(nlog(n))$.
Graph partitioning problems emerge in a wide variety of complex systems, ranging from biology to finance, but can be rigorously analyzed and solved only for a few graph ensembles. Here, an ensemble of equitable graphs, i.e. random graphs with a block-regular structure, is studied, for which analytical results can be obtained. In particular, the spectral density of this ensemble is computed exactly for a modular and bipartite structure. Kesten-McKays law for random regular graphs is found analytically to apply also for modular and bipartite structures when blocks are homogeneous. Exact solution to graph partitioning for two equal-sized communities is proposed and verified numerically, and a conjecture on the absence of an efficient recovery detectability transition in equitable graphs is suggested. Final discussion summarizes results and outlines their relevance for the solution of graph partitioning problems in other graph ensembles, in particular for the study of detectability thresholds and resolution limits in stochastic block models.
Core-periphery structure is an emerging property of a wide range of complex systems and indicate the presence of group of actors in the system with an higher number of connections among them and a lower number of connections with a sparsely connected periphery. The dynamics of a complex system which is interacting on a given graph structure is strictly connected with the spectral properties of the graph itself, nevertheless it is generally extremely hard to obtain analytic results which will hold for arbitrary large systems. Recently a statistical ensemble of random graphs with a regular block structure, i.e. the ensemble of equitable graphs, has been introduced and analytic results have been derived in the computationally-hard context of graph partitioning and community detection. In this paper, we present a general analytic result for a ensemble of equitable core-periphery graphs, yielding a new explicit formula for the spectral density of networks with core-periphery structure.
This paper proposes a novel model for predicting subgraphs in dynamic graphs, an extension of traditional link prediction. This proposed end-to-end model learns a mapping from the subgraph structures in the current snapshot to the subgraph structures in the next snapshot directly, i.e., edge existence among multiple nodes in the subgraph. A new mechanism named cross-attention with a twin-tower module is designed to integrate node attribute information and topology information collaboratively for learning subgraph evolution. We compare our model with several state-of-the-art methods for subgraph prediction and subgraph pattern prediction in multiple real-world homogeneous and heterogeneous dynamic graphs, respectively. Experimental results demonstrate that our model outperforms other models in these two tasks, with a gain increase from 5.02% to 10.88%.