No Arabic abstract
We consider the problem of testing graph cluster structure: given access to a graph $G=(V, E)$, can we quickly determine whether the graph can be partitioned into a few clusters with good inner conductance, or is far from any such graph? This is a generalization of the well-studied problem of testing graph expansion, where one wants to distinguish between the graph having good expansion (i.e. being a good single cluster) and the graph having a sparse cut (i.e. being a union of at least two clusters). A recent work of Czumaj, Peng, and Sohler (STOC15) gave an ingenious sublinear time algorithm for testing $k$-clusterability in time $tilde{O}(n^{1/2} text{poly}(k))$: their algorithm implicitly embeds a random sample of vertices of the graph into Euclidean space, and then clusters the samples based on estimates of Euclidean distances between the points. This yields a very efficient testing algorithm, but only works if the cluster structure is very strong: it is necessary to assume that the gap between conductances of accepted and rejected graphs is at least logarithmic in the size of the graph $G$. In this paper we show how one can leverage more refined geometric information, namely angles as opposed to distances, to obtain a sublinear time tester that works even when the gap is a sufficiently large constant. Our tester is based on the singular value decomposition of a natural matrix derived from random walk transition probabilities from a small sample of seed nodes. We complement our algorithm with a matching lower bound on the query complexity of testing clusterability. Our lower bound is based on a novel property testing problem, which we analyze using Fourier analytic tools. As a byproduct of our techniques, we also achieve new lower bounds for the problem of approximating MAX-CUT value in sublinear time.
We present new lower bounds that show that a polynomial number of passes are necessary for solving some fundamental graph problems in the streaming model of computation. For instance, we show that any streaming algorithm that finds a weighted minimum $s$-$t$ cut in an $n$-vertex undirected graph requires $n^{2-o(1)}$ space unless it makes $n^{Omega(1)}$ passes over the stream. To prove our lower bounds, we introduce and analyze a new four-player communication problem that we refer to as the hidden-pointer chasing problem. This is a problem in spirit of the standard pointer chasing problem with the key difference that the pointers in this problem are hidden to players and finding each one of them requires solving another communication problem, namely the set intersection problem. Our lower bounds for graph problems are then obtained by reductions from the hidden-pointer chasing problem. Our hidden-pointer chasing problem appears flexible enough to find other applications and is therefore interesting in its own right. To showcase this, we further present an interesting application of this problem beyond streaming algorithms. Using a reduction from hidden-pointer chasing, we prove that any algorithm for submodular function minimization needs to make $n^{2-o(1)}$ value queries to the function unless it has a polynomial degree of adaptivity.
We prove that any two-pass graph streaming algorithm for the $s$-$t$ reachability problem in $n$-vertex directed graphs requires near-quadratic space of $n^{2-o(1)}$ bits. As a corollary, we also obtain near-quadratic space lower bounds for several other fundamental problems including maximum bipartite matching and (approximate) shortest path in undirected graphs. Our results collectively imply that a wide range of graph problems admit essentially no non-trivial streaming algorithm even when two passes over the input is allowed. Prior to our work, such impossibility results were only known for single-pass streaming algorithms, and the best two-pass lower bounds only ruled out $o(n^{7/6})$ space algorithms, leaving open a large gap between (trivial) upper bounds and lower bounds.
We study space-pass tradeoffs in graph streaming algorithms for parameter estimation and property testing problems such as estimating the size of maximum matchings and maximum cuts, weight of minimum spanning trees, or testing if a graph is connected or cycle-free versus being far from these properties. We develop a new lower bound technique that proves that for many problems of interest, including all the above, obtaining a $(1+epsilon)$-approximation requires either $n^{Omega(1)}$ space or $Omega(1/epsilon)$ passes, even on highly restricted families of graphs such as bounded-degree planar graphs. For multiple of these problems, this bound matches those of existing algorithms and is thus (asymptotically) optimal. Our results considerably strengthen prior lower bounds even for arbitrary graphs: starting from the influential work of [Verbin, Yu; SODA 2011], there has been a plethora of lower bounds for single-pass algorithms for these problems; however, the only multi-pass lower bounds proven very recently in [Assadi, Kol, Saxena, Yu; FOCS 2020] rules out sublinear-space algorithms with exponentially smaller $o(log{(1/epsilon)})$ passes for these problems. One key ingredient of our proofs is a simple streaming XOR Lemma, a generic hardness amplification result, that we prove: informally speaking, if a $p$-pass $s$-space streaming algorithm can only solve a decision problem with advantage $delta > 0$ over random guessing, then it cannot solve XOR of $ell$ independent copies of the problem with advantage much better than $delta^{ell}$. This result can be of independent interest and useful for other streaming lower bounds as well.
We study the space complexity of sketching cuts and Laplacian quadratic forms of graphs. We show that any data structure which approximately stores the sizes of all cuts in an undirected graph on $n$ vertices up to a $1+epsilon$ error must use $Omega(nlog n/epsilon^2)$ bits of space in the worst case, improving the $Omega(n/epsilon^2)$ bound of Andoni et al. and matching the best known upper bound achieved by spectral sparsifiers. Our proof is based on a rigidity phenomenon for cut (and spectral) approximation which may be of independent interest: any two $d-$regular graphs which approximate each others cuts significantly better than a random graph approximates the complete graph must overlap in a constant fraction of their edges.
Classic dynamic data structure problems maintain a data structure subject to a sequence S of updates and they answer queries using the latest version of the data structure, i.e., the data structure after processing the whole sequence. To handle operations that change the sequence S of updates, Demaine et al. (TALG 2007) introduced retroactive data structures. A retroactive operation modifies the update sequence S in a given position t, called time, and either creates or cancels an update in S at time t. A partially retroactive data structure restricts queries to be executed exclusively in the latest version of the data structure. A fully retroactive data structure supports queries at any time t: a query at time t is answered using only the updates of S up to time t. If the sequence S only consists of insertions, the resulting data structure is an incremental retroactive data structure. While efficient retroactive data structures have been proposed for classic data structures, e.g., stack, priority queue and binary search tree, the retroactive version of graph problems are rarely studied. In this paper we study retroactive graph problems including connectivity, minimum spanning forest (MSF), maximum degree, etc. We provide fully retroactive data structures for maintaining the maximum degree, connectivity and MSF in $tilde{O}(n)$ time per operation. We also give an algorithm for the incremental fully retroactive connectivity with $tilde{O}(1)$ time per operation. We compliment our algorithms with almost tight hardness results. We show that under the OMv conjecture (proposed by Henzinger et al. (STOC 2015)), there does not exist fully retroactive data structures maintaining connectivity or MSF, or incremental fully retroactive data structure maintaining the maximum degree with $O(n^{1-epsilon})$ time per operation, for any constant $epsilon > 0$.