Sublinear Estimation of Weighted Matchings in Dynamic Data Streams

332 0 0.0 ( 0 )

Download Cite

Added by Marc Bury

Publication date 2015

fields Informatics Engineering

and research's language is English

Authors Marc Bury - Chris Schwiegelshohn

Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper presents an algorithm for estimating the weight of a maximum weighted matching by augmenting any estimation routine for the size of an unweighted matching. The algorithm is implementable in any streaming model including dynamic graph streams. We also give the first constant estimation for the maximum matching size in a dynamic graph stream for planar graphs (or any graph with bounded arboricity) using $tilde{O}(n^{4/5})$ space which also extends to weighted matching. Using previous results by Kapralov, Khanna, and Sudan (2014) we obtain a $mathrm{polylog}(n)$ approximation for general graphs using $mathrm{polylog}(n)$ space in random order streams, respectively. In addition, we give a space lower bound of $Omega(n^{1-varepsilon})$ for any randomized algorithm estimating the size of a maximum matching up to a $1+O(varepsilon)$ factor for adversarial streams.

rate research

109 - Marc Bury , Chris Schwiegelshohn , Mara Sorella 2016

The Jaccard index is an important similarity measure for item sets and Boolean data. On large datasets, an exact similarity computation is often infeasible for all item pairs both due to time and space constraints, giving rise to faster approximate methods. The algorithm of choice used to quickly compute the Jaccard index $frac{vert A cap B vert}{vert Acup Bvert}$ of two item sets $A$ and $B$ is usually a form of min-hashing. Most min-hashing schemes are maintainable in data streams processing only additions, but none are known to work when facing item-wise deletions. In this paper, we investigate scalable approximation algorithms for rational set similarities, a broad class of similarity measures including Jaccard. Motivated by a result of Chierichetti and Kumar [J. ACM 2015] who showed any rational set similarity $S$ admits a locality sensitive hashing (LSH) scheme if and only if the corresponding distance $1-S$ is a metric, we can show that there exists a space efficient summary maintaining a $(1pm varepsilon)$ multiplicative approximation to $1-S$ in dynamic data streams. This in turn also yields a $varepsilon$ additive approximation of the similarity. The existence of these approximations hints at, but does not directly imply a LSH scheme in dynamic data streams. Our second and main contribution now lies in the design of such a LSH scheme maintainable in dynamic data streams. The scheme is space efficient, easy to implement and to the best of our knowledge the first of its kind able to process deletions.

Data Structures and Algorithms

Clustering High Dimensional Dynamic Data Streams

138 - Vladimir Braverman , Gereon Frahling , Harry Lang 2017

We present data streaming algorithms for the $k$-median problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space ${1, 2, ldots Delta}^d$. Our algorithms use $k epsilon^{-2} poly(d log Delta)$ space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of $k$ centers the cost of the coreset $(1+epsilon)$-approximates the cost of the streamed point set. We also provide algorithms that guarantee only positive weights in the coreset with additional logarithmic factors in the space and time complexities. We can use this positively-weighted coreset to compute a $(1+epsilon)$-approximation for the $k$-median problem by any efficient offline $k$-median algorithm. All previous algorithms for computing a $(1+epsilon)$-approximation for the $k$-median problem over dynamic data streams required space and time exponential in $d$. Our algorithms can be generalized to metric spaces of bounded doubling dimension.

Data Structures and Algorithms

Deterministic Rounding of Dynamic Fractional Matchings

55 - Sayan Bhattacharya , Peter Kiss 2021

We present a framework for deterministically rounding a dynamic fractional matching. Applying our framework in a black-box manner on top of existing fractional matching algorithms, we derive the following new results: (1) The first deterministic algorithm for maintaining a $(2-delta)$-approximate maximum matching in a fully dynamic bipartite graph, in arbitrarily small polynomial update time. (2) The first deterministic algorithm for maintaining a $(1+delta)$-approximate maximum matching in a decremental bipartite graph, in polylogarithmic update time. (3) The first deterministic algorithm for maintaining a $(2+delta)$-approximate maximum matching in a fully dynamic general graph, in small polylogarithmic (specifically, $O(log^4 n)$) update time. These results are respectively obtained by applying our framework on top of the fractional matching algorithms of Bhattacharya et al. [STOC16], Bernstein et al. [FOCS20], and Bhattacharya and Kulkarni [SODA19]. Prior to our work, there were two known general-purpose rounding schemes for dynamic fractional matchings. Both these schemes, by Arar et al. [ICALP18] and Wajc [STOC20], were randomized. Our rounding scheme works by maintaining a good {em matching-sparsifier} with bounded arboricity, and then applying the algorithm of Peleg and Solomon [SODA16] to maintain a near-optimal matching in this low arboricity graph. To the best of our knowledge, this is the first dynamic matching algorithm that works on general graphs by using an algorithm for low-arboricity graphs as a black-box subroutine. This feature of our rounding scheme might be of independent interest.

Data Structures and Algorithms

Single Pass Spectral Sparsification in Dynamic Streams

436 - Michael Kapralov , Yin Tat Lee , Cameron Musco 2014

We present the first single pass algorithm for computing spectral sparsifiers of graphs in the dynamic semi-streaming model. Given a single pass over a stream containing insertions and deletions of edges to a graph G, our algorithm maintains a randomized linear sketch of the incidence matrix of G into dimension O((1/epsilon^2) n polylog(n)). Using this sketch, at any point, the algorithm can output a (1 +/- epsilon) spectral sparsifier for G with high probability. While O((1/epsilon^2) n polylog(n)) space algorithms are known for computing cut sparsifiers in dynamic streams [AGM12b, GKP12] and spectral sparsifiers in insertion-only streams [KL11], prior to our work, the best known single pass algorithm for maintaining spectral sparsifiers in dynamic streams required sketches of dimension Omega((1/epsilon^2) n^(5/3)) [AGM14]. To achieve our result, we show that, using a coarse sparsifier of G and a linear sketch of Gs incidence matrix, it is possible to sample edges by effective resistance, obtaining a spectral sparsifier of arbitrary precision. Sampling from the sketch requires a novel application of ell_2/ell_2 sparse recovery, a natural extension of the ell_0 methods used for cut sparsifiers in [AGM12b]. Recent work of [MP12] on row sampling for matrix approximation gives a recursive approach for obtaining the required coarse sparsifiers. Under certain restrictions, our approach also extends to the problem of maintaining a spectral approximation for a general matrix A^T A given a stream of updates to rows in A.

Data Structures and Algorithms

Correlation Clustering in Data Streams

128 - Kook Jin Ahn , Graham Cormode , Sudipto Guha 2018

Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as $k$-center, $k$-median, and $k$-means. Such algorithms need to be both time and and space efficient. In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on $n$ nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in different clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, $O(ncdot mbox{polylog}~n)$-space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches that allow the quality of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. Unfortunately, the standard LP and SDP formulations are not obviously solvable in $O(ncdot mbox{polylog}~n)$-space. Our work presents space-efficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling.

Data Structures and Algorithms