No Arabic abstract
This paper aims at comparing two coupling approaches as basic layers for building clustering criteria, suited for modularizing and clustering very large networks. We briefly use optimal transport theory as a starting point, and a way as well, to derive two canonical couplings: statistical independence and logical indetermination. A symmetric list of properties is provided and notably the so called Monges properties, applied to contingency matrices, and justifying the $otimes$ versus $oplus$ notation. A study is proposed, highlighting logical indetermination, because it is, by far, lesser known. Eventually we estimate the average difference between both couplings as the key explanation of their usually close results in network clustering.
In this note, we prove a tight lower bound on the joint entropy of $n$ unbiased Bernoulli random variables which are $n/2$-wise independent. For general $k$-wise independence, we give new lower bounds by adapting Navon and Samorodnitskys Fourier proof of the `LP bound on error correcting codes. This counts as partial progress on a problem asked by Gavinsky and Pudlak.
With a view on graph clustering, we present a definition of vertex-to-vertex distance which is based on shared connectivity. We argue that vertices sharing more connections are closer to each other than vertices sharing fewer connections. Our thesis is centered on the widely accepted notion that strong clusters are formed by high levels of induced subgraph density, where subgraphs represent clusters. We argue these clusters are formed by grouping vertices deemed to be similar in their connectivity. At the cluster level (induced subgraph level), our thesis translates into low mean intra-cluster distances. Our definition differs from the usual shortest-path geodesic distance. In this article, we compare three distance measures from the literature. Our benchmark is the accuracy of each measures reflection of intra-cluster density, when aggregated (averaged) at the cluster level. We conduct our tests on synthetic graphs generated using the planted partition model, where clusters and intra-cluster density are known in advance. We examine correlations between mean intra-cluster distances and intra-cluster densities. Our numerical experiments show that Jaccard and Otsuka-Ochiai offer very accurate measures of density, when averaged over vertex pairs within clusters.
We consider the classic Moran process modeling the spread of genetic mutations, as extended to structured populations by Lieberman et al. (Nature, 2005). In this process, individuals are the vertices of a connected graph $G$. Initially, there is a single mutant vertex, chosen uniformly at random. In each step, a random vertex is selected for reproduction with a probability proportional to its fitness: mutants have fitness $r>1$, while non-mutants have fitness 1. The vertex chosen to reproduce places a copy of itself to a uniformly random neighbor in $G$, replacing the individual that was there. The process ends when the mutation either reaches fixation (i.e., all vertices are mutants), or gets extinct. The principal quantity of interest is the probability with which each of the two outcomes occurs. A problem that has received significant attention recently concerns the existence of families of graphs, called strong amplifiers of selection, for which the fixation probability tends to 1 as the order $n$ of the graph increases, and the existence of strong suppressors of selection, for which this probability tends to 0. For the case of directed graphs, it is known that both strong amplifiers and suppressors exist. For the case of undirected graphs, however, the problem has remained open, and the general belief has been that neither strong amplifiers nor suppressors exist. In this paper we disprove this belief, by providing the first examples of such graphs. The strong amplifier we present has fixation probability $1-tilde O(n^{-1/3})$, and the strong suppressor has fixation probability $tilde O(n^{-1/4})$. Both graph constructions are surprisingly simple. We also prove a general upper bound of $1-tilde Omega(n^{-1/3})$ on the fixation probability of any undirected graph. Hence, our strong amplifier is existentially optimal.
We apply the power-of-two-choices paradigm to a random walk on a graph: rather than moving to a uniform random neighbour at each step, a controller is allowed to choose from two independent uniform random neighbours. We prove that this allows the controller to significantly accelerate the hitting and cover times in several natural graph classes. In particular, we show that the cover time becomes linear in the number $n$ of vertices on discrete tori and bounded degree trees, of order $mathcal{O}(n log log n)$ on bounded degree expanders, and of order $mathcal{O}(n (log log n)^2)$ on the ErdH{o}s-R{e}nyi random graph in a certain sparsely connected regime. We also consider the algorithmic question of computing an optimal strategy, and prove a dichotomy in efficiency between computing strategies for hitting and cover times.
Network models with latent geometry have been used successfully in many applications in network science and other disciplines, yet it is usually impossible to tell if a given real network is geometric, meaning if it is a typical element in an ensemble of random geometric graphs. Here we identify structural properties of networks that guarantee that random graphs having these properties are geometric. Specifically we show that random graphs in which expected degree and clustering of every node are fixed to some constants are equivalent to random geometric graphs on the real line, if clustering is sufficiently strong. Large numbers of triangles, homogeneously distributed across all nodes as in real networks, are thus a consequence of network geometricity. The methods we use to prove this are quite general and applicable to other network ensembles, geometric or not, and to certain problems in quantum gravity.