No Arabic abstract
We present the first single pass algorithm for computing spectral sparsifiers of graphs in the dynamic semi-streaming model. Given a single pass over a stream containing insertions and deletions of edges to a graph G, our algorithm maintains a randomized linear sketch of the incidence matrix of G into dimension O((1/epsilon^2) n polylog(n)). Using this sketch, at any point, the algorithm can output a (1 +/- epsilon) spectral sparsifier for G with high probability. While O((1/epsilon^2) n polylog(n)) space algorithms are known for computing cut sparsifiers in dynamic streams [AGM12b, GKP12] and spectral sparsifiers in insertion-only streams [KL11], prior to our work, the best known single pass algorithm for maintaining spectral sparsifiers in dynamic streams required sketches of dimension Omega((1/epsilon^2) n^(5/3)) [AGM14]. To achieve our result, we show that, using a coarse sparsifier of G and a linear sketch of Gs incidence matrix, it is possible to sample edges by effective resistance, obtaining a spectral sparsifier of arbitrary precision. Sampling from the sketch requires a novel application of ell_2/ell_2 sparse recovery, a natural extension of the ell_0 methods used for cut sparsifiers in [AGM12b]. Recent work of [MP12] on row sampling for matrix approximation gives a recursive approach for obtaining the required coarse sparsifiers. Under certain restrictions, our approach also extends to the problem of maintaining a spectral approximation for a general matrix A^T A given a stream of updates to rows in A.
We give a deterministic, nearly logarithmic-space algorithm for mild spectral sparsification of undirected graphs. Given a weighted, undirected graph $G$ on $n$ vertices described by a binary string of length $N$, an integer $kleq log n$, and an error parameter $epsilon > 0$, our algorithm runs in space $tilde{O}(klog (Ncdot w_{mathrm{max}}/w_{mathrm{min}}))$ where $w_{mathrm{max}}$ and $w_{mathrm{min}}$ are the maximum and minimum edge weights in $G$, and produces a weighted graph $H$ with $tilde{O}(n^{1+2/k}/epsilon^2)$ edges that spectrally approximates $G$, in the sense of Spielmen and Teng [ST04], up to an error of $epsilon$. Our algorithm is based on a new bounded-independence analysis of Spielman and Srivastavas effective resistance based edge sampling algorithm [SS08] and uses results from recent work on space-bounded Laplacian solvers [MRSV17]. In particular, we demonstrate an inherent tradeoff (via upper and lower bounds) between the amount of (bounded) independence used in the edge sampling algorithm, denoted by $k$ above, and the resulting sparsity that can be achieved.
We consider a fundamental algorithmic question in spectral graph theory: Compute a spectral sparsifier of random-walk matrix-polynomial $$L_alpha(G)=D-sum_{r=1}^dalpha_rD(D^{-1}A)^r$$ where $A$ is the adjacency matrix of a weighted, undirected graph, $D$ is the diagonal matrix of weighted degrees, and $alpha=(alpha_1...alpha_d)$ are nonnegative coefficients with $sum_{r=1}^dalpha_r=1$. Recall that $D^{-1}A$ is the transition matrix of random walks on the graph. The sparsification of $L_alpha(G)$ appears to be algorithmically challenging as the matrix power $(D^{-1}A)^r$ is defined by all paths of length $r$, whose precise calculation would be prohibitively expensive. In this paper, we develop the first nearly linear time algorithm for this sparsification problem: For any $G$ with $n$ vertices and $m$ edges, $d$ coefficients $alpha$, and $epsilon > 0$, our algorithm runs in time $O(d^2mlog^2n/epsilon^{2})$ to construct a Laplacian matrix $tilde{L}=D-tilde{A}$ with $O(nlog n/epsilon^{2})$ non-zeros such that $tilde{L}approx_{epsilon}L_alpha(G)$. Matrix polynomials arise in mathematical analysis of matrix functions as well as numerical solutions of matrix equations. Our work is particularly motivated by the algorithmic problems for speeding up the classic Newtons method in applications such as computing the inverse square-root of the precision matrix of a Gaussian random field, as well as computing the $q$th-root transition (for $qgeq1$) in a time-reversible Markov model. The key algorithmic step for both applications is the construction of a spectral sparsifier of a constant degree random-walk matrix-polynomials introduced by Newtons method. Our algorithm can also be used to build efficient data structures for effective resistances for multi-step time-reversible Markov models, and we anticipate that it could be useful for other tasks in network analysis.
The Jaccard index is an important similarity measure for item sets and Boolean data. On large datasets, an exact similarity computation is often infeasible for all item pairs both due to time and space constraints, giving rise to faster approximate methods. The algorithm of choice used to quickly compute the Jaccard index $frac{vert A cap B vert}{vert Acup Bvert}$ of two item sets $A$ and $B$ is usually a form of min-hashing. Most min-hashing schemes are maintainable in data streams processing only additions, but none are known to work when facing item-wise deletions. In this paper, we investigate scalable approximation algorithms for rational set similarities, a broad class of similarity measures including Jaccard. Motivated by a result of Chierichetti and Kumar [J. ACM 2015] who showed any rational set similarity $S$ admits a locality sensitive hashing (LSH) scheme if and only if the corresponding distance $1-S$ is a metric, we can show that there exists a space efficient summary maintaining a $(1pm varepsilon)$ multiplicative approximation to $1-S$ in dynamic data streams. This in turn also yields a $varepsilon$ additive approximation of the similarity. The existence of these approximations hints at, but does not directly imply a LSH scheme in dynamic data streams. Our second and main contribution now lies in the design of such a LSH scheme maintainable in dynamic data streams. The scheme is space efficient, easy to implement and to the best of our knowledge the first of its kind able to process deletions.
Cut and spectral sparsification of graphs have numerous applications, including e.g. speeding up algorithms for cuts and Laplacian solvers. These powerful notions have recently been extended to hypergraphs, which are much richer and may offer new applications. However, the current bounds on the size of hypergraph sparsifiers are not as tight as the corresponding bounds for graphs. Our first result is a polynomial-time algorithm that, given a hypergraph on $n$ vertices with maximum hyperedge size $r$, outputs an $epsilon$-spectral sparsifier with $O^*(nr)$ hyperedges, where $O^*$ suppresses $(epsilon^{-1} log n)^{O(1)}$ factors. This size bound improves the two previous bounds: $O^*(n^3)$ [Soma and Yoshida, SODA19] and $O^*(nr^3)$ [Bansal, Svensson and Trevisan, FOCS19]. Our main technical tool is a new method for proving concentration of the nonlinear analogue of the quadratic form of the Laplacians for hypergraph expanders. We complement this with lower bounds on the bit complexity of any compression scheme that $(1+epsilon)$-approximates all the cuts in a given hypergraph, and hence also on the bit complexity of every $epsilon$-cut/spectral sparsifier. These lower bounds are based on Ruzsa-Szemeredi graphs, and a particular instantiation yields an $Omega(nr)$ lower bound on the bit complexity even for fixed constant $epsilon$. This is tight up to polylogarithmic factors in $n$, due to recent hypergraph cut sparsifiers of [Chen, Khanna and Nagda, FOCS20]. Finally, for directed hypergraphs, we present an algorithm that computes an $epsilon$-spectral sparsifier with $O^*(n^2r^3)$ hyperarcs, where $r$ is the maximum size of a hyperarc. For small $r$, this improves over $O^*(n^3)$ known from [Soma and Yoshida, SODA19], and is getting close to the trivial lower bound of $Omega(n^2)$ hyperarcs.
We present data streaming algorithms for the $k$-median problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space ${1, 2, ldots Delta}^d$. Our algorithms use $k epsilon^{-2} poly(d log Delta)$ space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of $k$ centers the cost of the coreset $(1+epsilon)$-approximates the cost of the streamed point set. We also provide algorithms that guarantee only positive weights in the coreset with additional logarithmic factors in the space and time complexities. We can use this positively-weighted coreset to compute a $(1+epsilon)$-approximation for the $k$-median problem by any efficient offline $k$-median algorithm. All previous algorithms for computing a $(1+epsilon)$-approximation for the $k$-median problem over dynamic data streams required space and time exponential in $d$. Our algorithms can be generalized to metric spaces of bounded doubling dimension.