No Arabic abstract
In the $d$-Scattered Set problem we are asked to select at least $k$ vertices of a given graph, so that the distance between any pair is at least $d$. We study the problems (in-)approximability and offer improvements and extensions of known results for Independent Set, of which the problem is a generalization. Specifically, we show: - A lower bound of $Delta^{lfloor d/2rfloor-epsilon}$ on the approximation ratio of any polynomial-time algorithm for graphs of maximum degree $Delta$ and an improved upper bound of $O(Delta^{lfloor d/2rfloor})$ on the approximation ratio of any greedy scheme for this problem. - A polynomial-time $2sqrt{n}$-approximation for bipartite graphs and even values of $d$, that matches the known lower bound by considering the only remaining case. - A lower bound on the complexity of any $rho$-approximation algorithm of (roughly) $2^{frac{n^{1-epsilon}}{rho d}}$ for even $d$ and $2^{frac{n^{1-epsilon}}{rho(d+rho)}}$ for odd $d$ (under the randomized ETH), complemented by $rho$-approximation algorithms of running times that (almost) match these bounds.
In $d$-Scattered Set we are given an (edge-weighted) graph and are asked to select at least $k$ vertices, so that the distance between any pair is at least $d$, thus generalizing Independent Set. We provide upper and lower bounds on the complexity of this problem with respect to various standard graph parameters. In particular, we show the following: - For any $dge2$, an $O^*(d^{textrm{tw}})$-time algorithm, where $textrm{tw}$ is the treewidth of the input graph. - A tight SETH-based lower bound matching this algorithms performance. These generalize known results for Independent Set. - $d$-Scattered Set is W[1]-hard parameterized by vertex cover (for edge-weighted graphs), or feedback vertex set (for unweighted graphs), even if $k$ is an additional parameter. - A single-exponential algorithm parameterized by vertex cover for unweighted graphs, complementing the above-mentioned hardness. - A $2^{O(textrm{td}^2)}$-time algorithm parameterized by tree-depth ($textrm{td}$), as well as a matching ETH-based lower bound, both for unweighted graphs. We complement these mostly negative results by providing an FPT approximation scheme parameterized by treewidth. In particular, we give an algorithm which, for any error parameter $epsilon > 0$, runs in time $O^*((textrm{tw}/epsilon)^{O(textrm{tw})})$ and returns a $d/(1+epsilon)$-scattered set of size $k$, if a $d$-scattered set of the same size exists.
A Boolean constraint satisfaction problem (CSP), Max-CSP$(f)$, is a maximization problem specified by a constraint $f:{-1,1}^kto{0,1}$. An instance of the problem consists of $m$ constraint applications on $n$ Boolean variables, where each constraint application applies the constraint to $k$ literals chosen from the $n$ variables and their negations. The goal is to compute the maximum number of constraints that can be satisfied by a Boolean assignment to the $n$~variables. In the $(gamma,beta)$-approximation version of the problem for parameters $gamma geq beta in [0,1]$, the goal is to distinguish instances where at least $gamma$ fraction of the constraints can be satisfied from instances where at most $beta$ fraction of the constraints can be satisfied. In this work we consider the approximability of Max-CSP$(f)$ in the (dynamic) streaming setting, where constraints are inserted (and may also be deleted in the dynamic setting) one at a time. We completely characterize the approximability of all Boolean CSPs in the dynamic streaming setting. Specifically, given $f$, $gamma$ and $beta$ we show that either (1) the $(gamma,beta)$-approximation version of Max-CSP$(f)$ has a probabilistic dynamic streaming algorithm using $O(log n)$ space, or (2) for every $varepsilon > 0$ the $(gamma-varepsilon,beta+varepsilon)$-approximation version of Max-CSP$(f)$ requires $Omega(sqrt{n})$ space for probabilistic dynamic streaming algorithms. We also extend previously known results in the insertion-only setting to a wide variety of cases, and in particular the case of $k=2$ where we get a dichotomy and the case when the satisfying assignments of $f$ support a distribution on ${-1,1}^k$ with uniform marginals.
The k-means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space. For example, the popular Lloyds heuristic locates a center at the mean of each cluster. Despite persistent efforts on understanding the approximability of k-means, and other classic clustering problems such as k-median and k-minsum, our knowledge of the hardness of approximation factors of these problems remains quite poor. In this paper, we significantly improve upon the hardness of approximation factors known in the literature for these objectives. We show that if the input lies in a general metric space, it is NP-hard to approximate: $bullet$ Continuous k-median to a factor of $2-o(1)$; this improves upon the previous inapproximability factor of 1.36 shown by Guha and Khuller (J. Algorithms 99). $bullet$ Continuous k-means to a factor of $4- o(1)$; this improves upon the previous inapproximability factor of 2.10 shown by Guha and Khuller (J. Algorithms 99). $bullet$ k-minsum to a factor of $1.415$; this improves upon the APX-hardness shown by Guruswami and Indyk (SODA 03). Our results shed new and perhaps counter-intuitive light on the differences between clustering problems in the continuous setting versus the discrete setting (where the candidate centers are given as part of the input).
We show that there is no deterministic local algorithm (constant-time distributed graph algorithm) that finds a $(7-epsilon)$-approximation of a minimum dominating set on planar graphs, for any positive constant $epsilon$. In prior work, the best lower bound on the approximation ratio has been $5-epsilon$; there is also an upper bound of $52$.
Automating algorithm configuration is growing increasingly necessary as algorithms come with more and more tunable parameters. It is common to tune parameters using machine learning, optimizing performance metrics such as runtime and solution quality. The training set consists of problem instances from the specific domain at hand. We investigate a fundamental question about these techniques: how large should the training set be to ensure that a parameters average empirical performance over the training set is close to its expected, future performance? We answer this question for algorithm configuration problems that exhibit a widely-applicable structure: the algorithms performance as a function of its parameters can be approximated by a simple function. We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds. On the flip side, if the approximation holds only under the L-p norm for p smaller than infinity, it is not possible to provide meaningful sample complexity bounds in the worst case. We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science. Via experiments, we obtain sample complexity bounds that are up to 700 times smaller than the previously best-known bounds.