No Arabic abstract
We study the problem of parameterized matching in a stream where we want to output matches between a pattern of length m and the last m symbols of the stream before the next symbol arrives. Parameterized matching is a natural generalisation of exact matching where an arbitrary one-to-one relabelling of pattern symbols is allowed. We show how this problem can be solved in constant time per arriving stream symbol and sublinear, near optimal space with high probability. Our results are surprising and important: it has been shown that almost no streaming pattern matching problems can be solved (not even randomised) in less than Theta(m) space, with exact matching as the only known problem to have a sublinear, near optimal space solution. Here we demonstrate that a similar sublinear, near optimal space solution is achievable for an even more challenging problem. The proof is considerably more complex than that for exact matching.
In this paper, we study linear programming based approaches to the maximum matching problem in the semi-streaming model. The semi-streaming model has gained attention as a model for processing massive graphs as the importance of such graphs has increased. This is a model where edges are streamed-in in an adversarial order and we are allowed a space proportional to the number of vertices in a graph. In recent years, there has been several new results in this semi-streaming model. However broad techniques such as linear programming have not been adapted to this model. We present several techniques to adapt and optimize linear programming based approaches in the semi-streaming model with an application to the maximum matching problem. As a consequence, we improve (almost) all previous results on this problem, and also prove new results on interesting variants.
In this paper, we study the non-bipartite maximum matching problem in the semi-streaming model. The maximum matching problem in the semi-streaming model has received a significant amount of attention lately. While the problem has been somewhat well solved for bipartite graphs, the known algorithms for non-bipartite graphs use $2^{frac1epsilon}$ passes or $n^{frac1epsilon}$ time to compute a $(1-epsilon)$ approximation. In this paper we provide the first FPTAS (polynomial in $n,frac1epsilon$) for the problem which is efficient in both the running time and the number of passes. We also show that we can estimate the size of the matching in $O(frac1epsilon)$ passes using slightly superlinear space. To achieve both results, we use the structural properties of the matching polytope such as the laminarity of the tight sets and total dual integrality. The algorithms are iterative, and are based on the fractional packing and covering framework. However the formulations herein require exponentially many variables or constraints. We use laminarity, metric embeddings and graph sparsification to reduce the space required by the algorithms in between and across the iterations. This is the first use of these ideas in the semi-streaming model to solve a combinatorial optimization problem.
In the pattern matching with $d$ wildcards problem one is given a text $T$ of length $n$ and a pattern $P$ of length $m$ that contains $d$ wildcard characters, each denoted by a special symbol $?$. A wildcard character matches any other character. The goal is to establish for each $m$-length substring of $T$ whether it matches $P$. In the streaming model variant of the pattern matching with $d$ wildcards problem the text $T$ arrives one character at a time and the goal is to report, before the next character arrives, if the last $m$ characters match $P$ while using only $o(m)$ words of space. In this paper we introduce two new algorithms for the $d$ wildcard pattern matching problem in the streaming model. The first is a randomized Monte Carlo algorithm that is parameterized by a constant $0leq delta leq 1$. This algorithm uses $tilde{O}(d^{1-delta})$ amortized time per character and $tilde{O}(d^{1+delta})$ words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses $O(d+log m)$ worst-case time per character and $O(dlog m)$ words of space.
We study streaming submodular maximization subject to matching/$b$-matching constraints (MSM/MSbM), and present improved upper and lower bounds for these problems. On the upper bounds front, we give primal-dual algorithms achieving the following approximation ratios. $bullet$ $3+2sqrt{2}approx 5.828$ for monotone MSM, improving the previous best ratio of $7.75$. $bullet$ $4+3sqrt{2}approx 7.464$ for non-monotone MSM, improving the previous best ratio of $9.899$. $bullet$ $3+epsilon$ for maximum weight b-matching, improving the previous best ratio of $4+epsilon$. On the lower bounds front, we improve on the previous best lower bound of $frac{e}{e-1}approx 1.582$ for MSM, and show ETH-based lower bounds of $approx 1.914$ for polytime monotone MSM streaming algorithms. Our most substantial contributions are our algorithmic techniques. We show that the (randomized) primal-dual method, which originated in the study of maximum weight matching (MWM), is also useful in the context of MSM. To our knowledge, this is the first use of primal-dual based analysis for streaming submodular optimization. We also show how to reinterpret previous algorithms for MSM in our framework; hence, we hope our work is a step towards unifying old and new techniques for streaming submodular maximization, and that it paves the way for further new results.
We present a deterministic $(1+varepsilon)$-approximate maximum matching algorithm in $mathsf{poly}(1/varepsilon)$ passes in the semi-streaming model, solving the long-standing open problem of breaking the exponential barrier in the dependence on $1/varepsilon$. Our algorithm exponentially improves on the well-known randomized $(1/varepsilon)^{O(1/varepsilon)}$-pass algorithm from the seminal work by McGregor [APPROX05], the recent deterministic algorithm by Tirodkar with the same pass complexity [FSTTCS18], as well as the deterministic $log n cdot mathsf{poly}(1/varepsilon)$-pass algorithm by Ahn and Guha [ICALP11].