No Arabic abstract
We study problems with stochastic uncertainty information on intervals for which the precise value can be queried by paying a cost. The goal is to devise an adaptive decision tree to find a correct solution to the problem in consideration while minimizing the expected total query cost. We show that, for the sorting problem, such a decision tree can be found in polynomial time. For the problem of finding the data item with minimum value, we have some evidence for hardness. This contradicts intuition, since the minimum problem is easier both in the online setting with adversarial inputs and in the offline verification setting. However, the stochastic assumption can be leveraged to beat both deterministic and randomized approximation lower bounds for the online setting.
We study the problem of sorting under incomplete information, when queries are used to resolve uncertainties. Each of $n$ data items has an unknown value, which is known to lie in a given interval. We can pay a query cost to learn the actual value, and we may allow an error threshold in the sorting. The goal is to find a nearly-sorted permutation by performing a minimum-cost set of queries. We show that an offline optimum query set can be found in poly time, and that both oblivious and adaptive problems have simple competitive algorithms. The competitive ratio for the oblivious problem is $n$ for uniform query costs, and unbounded for arbitrary costs; for the adaptive problem, the ratio is 2. We present a unified adaptive strategy for uniform costs that yields the following improved results: (1) a 3/2-competitive randomized algorithm; (2) a 5/3-competitive deterministic algorithm if the dependency graph has no 2-components after some preprocessing, which has competitive ratio $3/2+mathrm{O}(1/k)$ if the components obtained have size at least $k$; and (3) an exact algorithm for laminar families of intervals. The first two results have matching lower bounds, and we have a lower bound of 7/5 for large components. We also give a randomized adaptive algorithm with competitive ratio $1+frac{4}{3sqrt{3}}approx 1.7698$ for arbitrary query costs, and we show that the 2-competitive deterministic adaptive algorithm can be generalized for queries returning intervals and for a more general vertex cover problem, by using the local ratio technique. Moreover, we prove that the advice complexity of the adaptive problem is $lfloor n/2rfloor$ if no error threshold is allowed, and $lceil n/3cdotlg 3rceil$ for the general case. Finally, we present some graph-theoretical results on co-threshold tolerance graphs, and we discuss uncertainty variants of some classical interval problems.
We consider submodular function minimization in the oracle model: given black-box access to a submodular set function $f:2^{[n]}rightarrow mathbb{R}$, find an element of $argmin_S {f(S)}$ using as few queries to $f(cdot)$ as possible. State-of-the-art algorithms succeed with $tilde{O}(n^2)$ queries [LeeSW15], yet the best-known lower bound has never been improved beyond $n$ [Harvey08]. We provide a query lower bound of $2n$ for submodular function minimization, a $3n/2-2$ query lower bound for the non-trivial minimizer of a symmetric submodular function, and a $binom{n}{2}$ query lower bound for the non-trivial minimizer of an asymmetric submodular function. Our $3n/2-2$ lower bound results from a connection between SFM lower bounds and a novel concept we term the cut dimension of a graph. Interestingly, this yields a $3n/2-2$ cut-query lower bound for finding the global mincut in an undirected, weighted graph, but we also prove it cannot yield a lower bound better than $n+1$ for $s$-$t$ mincut, even in a directed, weighted graph.
In the stochastic online vector balancing problem, vectors $v_1,v_2,ldots,v_T$ chosen independently from an arbitrary distribution in $mathbb{R}^n$ arrive one-by-one and must be immediately given a $pm$ sign. The goal is to keep the norm of the discrepancy vector, i.e., the signed prefix-sum, as small as possible for a given target norm. We consider some of the most well-known problems in discrepancy theory in the above online stochastic setting, and give algorithms that match the known offline bounds up to $mathsf{polylog}(nT)$ factors. This substantially generalizes and improves upon the previous results of Bansal, Jiang, Singla, and Sinha (STOC 20). In particular, for the Koml{o}s problem where $|v_t|_2leq 1$ for each $t$, our algorithm achieves $tilde{O}(1)$ discrepancy with high probability, improving upon the previous $tilde{O}(n^{3/2})$ bound. For Tusn{a}dys problem of minimizing the discrepancy of axis-aligned boxes, we obtain an $O(log^{d+4} T)$ bound for arbitrary distribution over points. Previous techniques only worked for product distributions and gave a weaker $O(log^{2d+1} T)$ bound. We also consider the Banaszczyk setting, where given a symmetric convex body $K$ with Gaussian measure at least $1/2$, our algorithm achieves $tilde{O}(1)$ discrepancy with respect to the norm given by $K$ for input distributions with sub-exponential tails. Our key idea is to introduce a potential that also enforces constraints on how the discrepancy vector evolves, allowing us to maintain certain anti-concentration properties. For the Banaszczyk setting, we further enhance this potential by combining it with ideas from generic chaining. Finally, we also extend these results to the setting of online multi-color discrepancy.
We study approximation algorithms for the problem of minimizing the makespan on a set of machines with uncertainty on the processing times of jobs. In the model we consider, which goes back to~cite{BertsimasS03}, once the schedule is defined an adversary can pick a scenario where deviation is added to some of the jobs processing times. Given only the maximal cardinality of these jobs, and the magnitude of potential deviation for each job, the goal is to optimize the worst-case scenario. We consider both the cases of identical and unrelated machines. Our main result is an EPTAS for the case of identical machines. We also provide a $3$-approximation algorithm and an inapproximability ratio of $2-epsilon$ for the case of unrelated machines
Approximate Membership Query structures (AMQs) rely on randomisation for time- and space-efficiency, while introducing a possibility of false positive and false negative answers. Correctness proofs of such structures involve subtle reasoning about bounds on probabilities of getting certain outcomes. Because of these subtleties, a number of unsound arguments in such proofs have been made over the years. In this work, we address the challenge of building rigorous and reusable computer-assisted proofs about probabilistic specifications of AMQs. We describe the framework for systematic decomposition of AMQs and their properties into a series of interfaces and reusable components. We implement our framework as a library in the Coq proof assistant and showcase it by encoding in it a number of non-trivial AMQs, such as Bloom filters, counting filters, quotient filters and blocked constructions, and mechanising the proofs of their probabilistic specifications. We demonstrate how AMQs encoded in our framework guarantee the absence of false negatives by construction. We also show how the proofs about probabilities of false positives for complex AMQs can be obtained by means of verified reduction to the implementations of their simpler counterparts. Finally, we provide a library of domain-specific theorems and tactics that allow a high degree of automation in probabilistic proofs.