No Arabic abstract
With a view on graph clustering, we present a definition of vertex-to-vertex distance which is based on shared connectivity. We argue that vertices sharing more connections are closer to each other than vertices sharing fewer connections. Our thesis is centered on the widely accepted notion that strong clusters are formed by high levels of induced subgraph density, where subgraphs represent clusters. We argue these clusters are formed by grouping vertices deemed to be similar in their connectivity. At the cluster level (induced subgraph level), our thesis translates into low mean intra-cluster distances. Our definition differs from the usual shortest-path geodesic distance. In this article, we compare three distance measures from the literature. Our benchmark is the accuracy of each measures reflection of intra-cluster density, when aggregated (averaged) at the cluster level. We conduct our tests on synthetic graphs generated using the planted partition model, where clusters and intra-cluster density are known in advance. We examine correlations between mean intra-cluster distances and intra-cluster densities. Our numerical experiments show that Jaccard and Otsuka-Ochiai offer very accurate measures of density, when averaged over vertex pairs within clusters.
Exponential family Random Graph Models (ERGMs) can be viewed as expressing a probability distribution on graphs arising from the action of competing social forces that make ties more or less likely, depending on the state of the rest of the graph. Such forces often lead to a complex pattern of dependence among edges, with non-trivial large-scale structures emerging from relatively simple local mechanisms. While this provides a powerful tool for probing macro-micro connections, much remains to be understood about how local forces shape global outcomes. One simple question of this type is that of the conditions needed for social forces to stabilize a particular structure. We refer to this property as local stability and seek a general means of identifying the set of parameters under which a target graph is locally stable with respect to a set of alternatives. Here, we provide a complete characterization of the region of the parameter space inducing local stability, showing it to be the interior of a convex cone whose faces can be derived from the change-scores of the sufficient statistics vis-a-vis the alternative structures. As we show, local stability is a necessary but not sufficient condition for more general notions of stability, the latter of which can be explored more efficiently by using the ``stable cone within the parameter space as a starting point. In addition, we show how local stability can be used to determine whether a fitted model implies that an observed structure would be expected to arise primarily from the action of social forces, versus by merit of the model permitting a large number of high probability structures, of which the observed structure is one. We also use our approach to identify the dyads within a given structure that are the least stable, and hence predicted to have the highest probability of changing over time.
Graph Neural Networks (GNNs) have achieved state-of-the-art results on many graph analysis tasks such as node classification and link prediction. However, important unsupervised problems on graphs, such as graph clustering, have proved more resistant to advances in GNNs. In this paper, we study unsupervised training of GNN pooling in terms of their clustering capabilities. We start by drawing a connection between graph clustering and graph pooling: intuitively, a good graph clustering is what one would expect from a GNN pooling layer. Counterintuitively, we show that this is not true for state-of-the-art pooling methods, such as MinCut pooling. To address these deficiencies, we introduce Deep Modularity Networks (DMoN), an unsupervised pooling method inspired by the modularity measure of clustering quality, and show how it tackles recovery of the challenging clustering structure of real-world graphs. In order to clarify the regimes where existing methods fail, we carefully design a set of experiments on synthetic data which show that DMoN is able to jointly leverage the signal from the graph structure and node attributes. Similarly, on real-world data, we show that DMoN produces high quality clusters which correlate strongly with ground truth labels, achieving state-of-the-art results.
This paper aims at comparing two coupling approaches as basic layers for building clustering criteria, suited for modularizing and clustering very large networks. We briefly use optimal transport theory as a starting point, and a way as well, to derive two canonical couplings: statistical independence and logical indetermination. A symmetric list of properties is provided and notably the so called Monges properties, applied to contingency matrices, and justifying the $otimes$ versus $oplus$ notation. A study is proposed, highlighting logical indetermination, because it is, by far, lesser known. Eventually we estimate the average difference between both couplings as the key explanation of their usually close results in network clustering.
Local graph clustering and the closely related seed set expansion problem are primitives on graphs that are central to a wide range of analytic and learning tasks such as local clustering, community detection, nodes ranking and feature inference. Prior work on local graph clustering mostly falls into two categories with numerical and combinatorial roots respectively. In this work, we draw inspiration from both fields and propose a family of convex optimization formulations based on the idea of diffusion with p-norm network flow for $pin (1,infty)$. In the context of local clustering, we characterize the optimal solutions for these optimization problems and show their usefulness in finding low conductance cuts around input seed set. In particular, we achieve quadratic approximation of conductance in the case of $p=2$ similar to the Cheeger-type bounds of spectral methods, constant factor approximation when $prightarrowinfty$ similar to max-flow based methods, and a smooth transition for general $p$ values in between. Thus, our optimization formulation can be viewed as bridging the numerical and combinatorial approaches, and we can achieve the best of both worlds in terms of speed and noise robustness. We show that the proposed problem can be solved in strongly local running time for $pge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.
A majority of real life networks are weighted and sparse. The present article aims at characterization of weighted networks based on sparsity, as a measure of inherent diversity, of different network parameters. It utilizes sparsity index defined on ordered degree sequence of simple networks and derives further properties of this index. The range of possible values of sparsity index of any connected network, with edge-count in specific intervals, is worked out analytically in terms of node-count; a pattern is uncovered in corresponding degree sequences to produce highest sparsities. Given the edge-weight frequency distribution of a network, we have formulated an expression of the sparsity index of edge-weights. Its properties are analyzed under different distributions of edge-weights. For example, the upper and lower bounds of sparsity index of edge-weights of a network, having all distinct edge-weights, is determined in terms of its node-count and edge density. The article highlights that this summary index with low computational cost, computed on different network parameters, is useful to reveal different structural and organizational aspects of networks for performing analysis. An application of this index has been demonstrated through overlapping community detection of networks. The results validated on artificial and real-world networks show its efficacy.