No Arabic abstract
The k-means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space. For example, the popular Lloyds heuristic locates a center at the mean of each cluster. Despite persistent efforts on understanding the approximability of k-means, and other classic clustering problems such as k-median and k-minsum, our knowledge of the hardness of approximation factors of these problems remains quite poor. In this paper, we significantly improve upon the hardness of approximation factors known in the literature for these objectives. We show that if the input lies in a general metric space, it is NP-hard to approximate: $bullet$ Continuous k-median to a factor of $2-o(1)$; this improves upon the previous inapproximability factor of 1.36 shown by Guha and Khuller (J. Algorithms 99). $bullet$ Continuous k-means to a factor of $4- o(1)$; this improves upon the previous inapproximability factor of 2.10 shown by Guha and Khuller (J. Algorithms 99). $bullet$ k-minsum to a factor of $1.415$; this improves upon the APX-hardness shown by Guruswami and Indyk (SODA 03). Our results shed new and perhaps counter-intuitive light on the differences between clustering problems in the continuous setting versus the discrete setting (where the candidate centers are given as part of the input).
In the $d$-Scattered Set problem we are asked to select at least $k$ vertices of a given graph, so that the distance between any pair is at least $d$. We study the problems (in-)approximability and offer improvements and extensions of known results for Independent Set, of which the problem is a generalization. Specifically, we show: - A lower bound of $Delta^{lfloor d/2rfloor-epsilon}$ on the approximation ratio of any polynomial-time algorithm for graphs of maximum degree $Delta$ and an improved upper bound of $O(Delta^{lfloor d/2rfloor})$ on the approximation ratio of any greedy scheme for this problem. - A polynomial-time $2sqrt{n}$-approximation for bipartite graphs and even values of $d$, that matches the known lower bound by considering the only remaining case. - A lower bound on the complexity of any $rho$-approximation algorithm of (roughly) $2^{frac{n^{1-epsilon}}{rho d}}$ for even $d$ and $2^{frac{n^{1-epsilon}}{rho(d+rho)}}$ for odd $d$ (under the randomized ETH), complemented by $rho$-approximation algorithms of running times that (almost) match these bounds.
A Boolean constraint satisfaction problem (CSP), Max-CSP$(f)$, is a maximization problem specified by a constraint $f:{-1,1}^kto{0,1}$. An instance of the problem consists of $m$ constraint applications on $n$ Boolean variables, where each constraint application applies the constraint to $k$ literals chosen from the $n$ variables and their negations. The goal is to compute the maximum number of constraints that can be satisfied by a Boolean assignment to the $n$~variables. In the $(gamma,beta)$-approximation version of the problem for parameters $gamma geq beta in [0,1]$, the goal is to distinguish instances where at least $gamma$ fraction of the constraints can be satisfied from instances where at most $beta$ fraction of the constraints can be satisfied. In this work we consider the approximability of Max-CSP$(f)$ in the (dynamic) streaming setting, where constraints are inserted (and may also be deleted in the dynamic setting) one at a time. We completely characterize the approximability of all Boolean CSPs in the dynamic streaming setting. Specifically, given $f$, $gamma$ and $beta$ we show that either (1) the $(gamma,beta)$-approximation version of Max-CSP$(f)$ has a probabilistic dynamic streaming algorithm using $O(log n)$ space, or (2) for every $varepsilon > 0$ the $(gamma-varepsilon,beta+varepsilon)$-approximation version of Max-CSP$(f)$ requires $Omega(sqrt{n})$ space for probabilistic dynamic streaming algorithms. We also extend previously known results in the insertion-only setting to a wide variety of cases, and in particular the case of $k=2$ where we get a dichotomy and the case when the satisfying assignments of $f$ support a distribution on ${-1,1}^k$ with uniform marginals.
We show that for any odd $k$ and any instance of the Max-kXOR constraint satisfaction problem, there is an efficient algorithm that finds an assignment satisfying at least a $frac{1}{2} + Omega(1/sqrt{D})$ fraction of constraints, where $D$ is a bound on the number of constraints that each variable occurs in. This improves both qualitatively and quantitatively on the recent work of Farhi, Goldstone, and Gutmann (2014), which gave a emph{quantum} algorithm to find an assignment satisfying a $frac{1}{2} + Omega(D^{-3/4})$ fraction of the equations. For arbitrary constraint satisfaction problems, we give a similar result for triangle-free instances; i.e., an efficient algorithm that finds an assignment satisfying at least a $mu + Omega(1/sqrt{D})$ fraction of constraints, where $mu$ is the fraction that would be satisfied by a uniformly random assignment.
We study the approximability of the NP-complete textsc{Maximum Minimal Feedback Vertex Set} problem. Informally, this natural problem seems to lie in an intermediate space between two more well-studied problems of this type: textsc{Maximum Minimal Vertex Cover}, for which the best achievable approximation ratio is $sqrt{n}$, and textsc{Upper Dominating Set}, which does not admit any $n^{1-epsilon}$ approximation. We confirm and quantify this intuition by showing the first non-trivial polynomial time approximation for textsc{Max Min FVS} with a ratio of $O(n^{2/3})$, as well as a matching hardness of approximation bound of $n^{2/3-epsilon}$, improving the previous known hardness of $n^{1/2-epsilon}$. The approximation algorithm also gives a cubic kernel when parameterized by the solution size. Along the way, we also obtain an $O(Delta)$-approximation and show that this is asymptotically best possible, and we improve the bound for which the problem is NP-hard from $Deltage 9$ to $Deltage 6$. Having settled the problems approximability in polynomial time, we move to the context of super-polynomial time. We devise a generalization of our approximation algorithm which, for any desired approximation ratio $r$, produces an $r$-approximate solution in time $n^{O(n/r^{3/2})}$. This time-approximation trade-off is essentially tight: we show that under the ETH, for any ratio $r$ and $epsilon>0$, no algorithm can $r$-approximate this problem in time $n^{O((n/r^{3/2})^{1-epsilon})}$, hence we precisely characterize the approximability of the problem for the whole spectrum between polynomial and sub-exponential time, up to an arbitrarily small constant in the second exponent.
There has been significant recent progress on algorithms for approximating graph spanners, i.e., algorithms which approximate the best spanner for a given input graph. Essentially all of these algorithms use the same basic LP relaxation, so a variety of papers have studied the limitations of this approach and proved integrality gaps for this LP in a variety of settings. We extend these results by showing that even the strongest lift-and-project methods cannot help significantly, by proving polynomial integrality gaps even for $n^{Omega(epsilon)}$ levels of the Lasserre hierarchy, for both the directed and undirected spanner problems. We also extend these integrality gaps to related problems, notably Directed Steiner Network and Shallow-Light Steiner Network.