No Arabic abstract
We study streaming submodular maximization subject to matching/$b$-matching constraints (MSM/MSbM), and present improved upper and lower bounds for these problems. On the upper bounds front, we give primal-dual algorithms achieving the following approximation ratios. $bullet$ $3+2sqrt{2}approx 5.828$ for monotone MSM, improving the previous best ratio of $7.75$. $bullet$ $4+3sqrt{2}approx 7.464$ for non-monotone MSM, improving the previous best ratio of $9.899$. $bullet$ $3+epsilon$ for maximum weight b-matching, improving the previous best ratio of $4+epsilon$. On the lower bounds front, we improve on the previous best lower bound of $frac{e}{e-1}approx 1.582$ for MSM, and show ETH-based lower bounds of $approx 1.914$ for polytime monotone MSM streaming algorithms. Our most substantial contributions are our algorithmic techniques. We show that the (randomized) primal-dual method, which originated in the study of maximum weight matching (MWM), is also useful in the context of MSM. To our knowledge, this is the first use of primal-dual based analysis for streaming submodular optimization. We also show how to reinterpret previous algorithms for MSM in our framework; hence, we hope our work is a step towards unifying old and new techniques for streaming submodular maximization, and that it paves the way for further new results.
We study the problem of parameterized matching in a stream where we want to output matches between a pattern of length m and the last m symbols of the stream before the next symbol arrives. Parameterized matching is a natural generalisation of exact matching where an arbitrary one-to-one relabelling of pattern symbols is allowed. We show how this problem can be solved in constant time per arriving stream symbol and sublinear, near optimal space with high probability. Our results are surprising and important: it has been shown that almost no streaming pattern matching problems can be solved (not even randomised) in less than Theta(m) space, with exact matching as the only known problem to have a sublinear, near optimal space solution. Here we demonstrate that a similar sublinear, near optimal space solution is achievable for an even more challenging problem. The proof is considerably more complex than that for exact matching.
In this paper we consider graph algorithms in models of computation where the space usage (random accessible storage, in addition to the read only input) is sublinear in the number of edges $m$ and the access to input data is constrained. These questions arises in many natural settings, and in particular in the analysis of MapReduce or similar algorithms that model constrained parallelism with sublinear central processing. In SPAA 2011, Lattanzi etal. provided a $O(1)$ approximation of maximum matching using $O(p)$ rounds of iterative filtering via mapreduce and $O(n^{1+1/p})$ space of central processing for a graph with $n$ nodes and $m$ edges. We focus on weighted nonbipartite maximum matching in this paper. For any constant $p>1$, we provide an iterative sampling based algorithm for computing a $(1-epsilon)$-approximation of the weighted nonbipartite maximum matching that uses $O(p/epsilon)$ rounds of sampling, and $O(n^{1+1/p})$ space. The results extends to $b$-Matching with small changes. This paper combines adaptive sketching literature and fast primal-dual algorithms based on relaxed Dantzig-Wolfe decision procedures. Each round of sampling is implemented through linear sketches and executed in a single round of MapReduce. The paper also proves that nonstandard linear relaxations of a problem, in particular penalty based formulations, are helpful in mapreduce and similar settings in reducing the adaptive dependence of the iterations.
We study the problem of maximizing a non-monotone submodular function subject to a cardinality constraint in the streaming model. Our main contribution is a single-pass (semi-)streaming algorithm that uses roughly $O(k / varepsilon^2)$ memory, where $k$ is the size constraint. At the end of the stream, our algorithm post-processes its data structure using any offline algorithm for submodular maximization, and obtains a solution whose approximation guarantee is $frac{alpha}{1+alpha}-varepsilon$, where $alpha$ is the approximation of the offline algorithm. If we use an exact (exponential time) post-processing algorithm, this leads to $frac{1}{2}-varepsilon$ approximation (which is nearly optimal). If we post-process with the algorithm of Buchbinder and Feldman (Math of OR 2019), that achieves the state-of-the-art offline approximation guarantee of $alpha=0.385$, we obtain $0.2779$-approximation in polynomial time, improving over the previously best polynomial-time approximation of $0.1715$ due to Feldman et al. (NeurIPS 2018). It is also worth mentioning that our algorithm is combinatorial and deterministic, which is rare for an algorithm for non-monotone submodular maximization, and enjoys a fast update time of $O(frac{log k + log (1/alpha)}{varepsilon^2})$ per element.
The need for real time analysis of rapidly producing data streams (e.g., video and image streams) motivated the design of streaming algorithms that can efficiently extract and summarize useful information from massive data on the fly. Such problems can often be reduced to maximizing a submodular set function subject to various constraints. While efficient streaming methods have been recently developed for monotone submodular maximization, in a wide range of applications, such as video summarization, the underlying utility function is non-monotone, and there are often various constraints imposed on the optimization problem to consider privacy or personalization. We develop the first efficient single pass streaming algorithm, Streaming Local Search, that for any streaming monotone submodular maximization algorithm with approximation guarantee $alpha$ under a collection of independence systems ${cal I}$, provides a constant $1/big(1+2/sqrt{alpha}+1/alpha +2d(1+sqrt{alpha})big)$ approximation guarantee for maximizing a non-monotone submodular function under the intersection of ${cal I}$ and $d$ knapsack constraints. Our experiments show that for video summarization, our method runs more than 1700 times faster than previous work, while maintaining practically the same performance.
In the pattern matching with $d$ wildcards problem one is given a text $T$ of length $n$ and a pattern $P$ of length $m$ that contains $d$ wildcard characters, each denoted by a special symbol $?$. A wildcard character matches any other character. The goal is to establish for each $m$-length substring of $T$ whether it matches $P$. In the streaming model variant of the pattern matching with $d$ wildcards problem the text $T$ arrives one character at a time and the goal is to report, before the next character arrives, if the last $m$ characters match $P$ while using only $o(m)$ words of space. In this paper we introduce two new algorithms for the $d$ wildcard pattern matching problem in the streaming model. The first is a randomized Monte Carlo algorithm that is parameterized by a constant $0leq delta leq 1$. This algorithm uses $tilde{O}(d^{1-delta})$ amortized time per character and $tilde{O}(d^{1+delta})$ words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses $O(d+log m)$ worst-case time per character and $O(dlog m)$ words of space.