No Arabic abstract
An ordering constraint satisfaction problem (OCSP) is given by a positive integer $k$ and a constraint predicate $Pi$ mapping permutations on ${1,ldots,k}$ to ${0,1}$. Given an instance of OCSP$(Pi)$ on $n$ variables and $m$ constraints, the goal is to find an ordering of the $n$ variables that maximizes the number of constraints that are satisfied, where a constraint specifies a sequence of $k$ distinct variables and the constraint is satisfied by an ordering on the $n$ variables if the ordering induced on the $k$ variables in the constraint satisfies $Pi$. OCSPs capture natural problems including Maximum acyclic subgraph (MAS) and Betweenness. In this work we consider the task of approximating the maximum number of satisfiable constraints in the (single-pass) streaming setting, where an instance is presented as a stream of constraints. We show that for every $Pi$, OCSP$(Pi)$ is approximation-resistant to $o(n)$-space streaming algorithms. This space bound is tight up to polylogarithmic factors. In the case of MAS our result shows that for every $epsilon>0$, MAS is not $1/2+epsilon$-approximable in $o(n)$ space. The previous best inapproximability result only ruled out a $3/4$-approximation in $o(sqrt n)$ space. Our results build on recent works of Chou, Golovnev, Sudan, Velingker, and Velusamy who show tight, linear-space inapproximability results for a broad class of (non-ordering) constraint satisfaction problems over arbitrary (finite) alphabets. We design a family of appropriate CSPs (one for every $q$) from any given OCSP, and apply their work to this family of CSPs. We show that the hard instances from this earlier work have a particular small-set expansion property. By exploiting this combinatorial property, in combination with the hardness results of the resulting families of CSPs, we give optimal inapproximability results for all OCSPs.
The minimum linear ordering problem (MLOP) seeks to minimize an aggregated cost $f(cdot)$ due to an ordering $sigma$ of the items (say $[n]$), i.e., $min_{sigma} sum_{iin [n]} f(E_{i,sigma})$, where $E_{i,sigma}$ is the set of items that are mapped by $sigma$ to indices at most $i$. This problem has been studied in the literature for various special cases of the cost function $f$, and in a general setting for a submodular or supermodular cost $f$ [ITT2012]. Though MLOP was known to be NP-hard for general submodular functions, it was unknown whether the special case of graphic matroid MLOP (with $f$ being the matroid rank function of a graph) was polynomial-time solvable. Following this motivation, we explore related classes of linear ordering problems, including symmetric submodular MLOP, minimum latency vertex cover, and minimum sum vertex cover. We show that the most special cases of our problem, graphic matroid MLOP and minimum latency vertex cover, are both NP-hard. We further expand the toolkit for approximating MLOP variants: using the theory of principal partitions, we show a $2-frac{1+ell_{f}}{1+|E|}$ approximation to monotone submodular MLOP, where $ell_{f}=frac{f(E)}{max_{xin E}f({x})}$ satisfies $1 leq ell_f leq |E|$. Thus our result improves upon the best known bound of $2-frac{2}{1+|E|}$ by Iwata, Tetali, and Tripathi [ITT2012]. This leads to a $2-frac{1+r(E)}{1+|E|}$ approximation for the matroid MLOP, corresponding to the case when $r$ is the rank function of a given matroid. Finally, we show that MLVC can be $4/3$ approximated, matching the integrality gap of its vanilla LP relaxation.
We show that the smoothed complexity of the FLIP algorithm for local Max-Cut is at most $smash{phi n^{O(sqrt{log n})}}$, where $n$ is the number of nodes in the graph and $phi$ is a parameter that measures the magnitude of perturbations applied on its edge weights. This improves the previously best upper bound of $phi n^{O(log n)}$ by Etscheid and R{o}glin. Our result is based on an analysis of long sequences of flips, which shows~that~it is very unlikely for every flip in a long sequence to incur a positive but small improvement in the cut weight. We also extend the same upper bound on the smoothed complexity of FLIP to all binary Maximum Constraint Satisfaction Problems.
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. This is a classic supervised learning problem that has drawn lots of attention, including for developing fast algorithms for solving the problem approximately. One of the most widely used algorithms for approximately optimizing the SVM objective is Stochastic Gradient Descent (SGD), which requires only $O(frac{1}{lambdaepsilon})$ random samples, and which immediately yields a streaming algorithm that uses $O(frac{d}{lambdaepsilon})$ space. For related problems, better streaming algorithms are only known for smooth functions, unlike the SVM objective that we focus on in this work. We initiate an investigation of the space complexity for both finding an approximate optimum of this objective, and for the related ``point estimation problem of sketching the data set to evaluate the function value $F_lambda$ on any query $(theta, b)$. We show that, for both problems, for dimensions $d=1,2$, one can obtain streaming algorithms with space polynomially smaller than $frac{1}{lambdaepsilon}$, which is the complexity of SGD for strongly convex functions like the bias-regularized SVM, and which is known to be tight in general, even for $d=1$. We also prove polynomial lower bounds for both point estimation and optimization. In particular, for point estimation we obtain a tight bound of $Theta(1/sqrt{epsilon})$ for $d=1$ and a nearly tight lower bound of $widetilde{Omega}(d/{epsilon}^2)$ for $d = Omega( log(1/epsilon))$. Finally, for optimization, we prove a $Omega(1/sqrt{epsilon})$ lower bound for $d = Omega( log(1/epsilon))$, and show similar bounds when $d$ is constant.
We give tight cell-probe bounds for the time to compute convolution, multiplication and Hamming distance in a stream. The cell probe model is a particularly strong computational model and subsumes, for example, the popular word RAM model. We first consider online convolution where the task is to output the inner product between a fixed $n$-dimensional vector and a vector of the $n$ most recent values from a stream. One symbol of the stream arrives at a time and the each output must be computed before the next symbols arrives. Next we show bounds for online multiplication where the stream consists of pairs of digits, one from each of two $n$ digit numbers that are to be multiplied. One pair arrives at a time and the task is to output a single new digit from the product before the next pair of digits arrives. Finally we look at the online Hamming distance problem where the Hamming distance is outputted instead of the inner product. For each of these three problems, we give a lower bound of $Omega(frac{delta}{w}log n)$ time on average per output, where $delta$ is the number of bits needed to represent an input symbol and $w$ is the cell or word size. We argue that these bound are in fact tight within the cell probe model.
We study space-pass tradeoffs in graph streaming algorithms for parameter estimation and property testing problems such as estimating the size of maximum matchings and maximum cuts, weight of minimum spanning trees, or testing if a graph is connected or cycle-free versus being far from these properties. We develop a new lower bound technique that proves that for many problems of interest, including all the above, obtaining a $(1+epsilon)$-approximation requires either $n^{Omega(1)}$ space or $Omega(1/epsilon)$ passes, even on highly restricted families of graphs such as bounded-degree planar graphs. For multiple of these problems, this bound matches those of existing algorithms and is thus (asymptotically) optimal. Our results considerably strengthen prior lower bounds even for arbitrary graphs: starting from the influential work of [Verbin, Yu; SODA 2011], there has been a plethora of lower bounds for single-pass algorithms for these problems; however, the only multi-pass lower bounds proven very recently in [Assadi, Kol, Saxena, Yu; FOCS 2020] rules out sublinear-space algorithms with exponentially smaller $o(log{(1/epsilon)})$ passes for these problems. One key ingredient of our proofs is a simple streaming XOR Lemma, a generic hardness amplification result, that we prove: informally speaking, if a $p$-pass $s$-space streaming algorithm can only solve a decision problem with advantage $delta > 0$ over random guessing, then it cannot solve XOR of $ell$ independent copies of the problem with advantage much better than $delta^{ell}$. This result can be of independent interest and useful for other streaming lower bounds as well.