No Arabic abstract
We develop a framework for the rigorous analysis of focused stochastic local search algorithms. These are algorithms that search a state space by repeatedly selecting some constraint that is violated in the current state and moving to a random nearby state that addresses the violation, while hopefully not introducing many new ones. An important class of focused local search algorithms with provable performance guarantees has recently arisen from algorithmizations of the Lov{a}sz Local Lemma (LLL), a non-constructive tool for proving the existence of satisfying states by introducing a background measure on the state space. While powerful, the state transitions of algorithms in this class must be, in a precise sense, perfectly compatible with the background measure. In many applications this is a very restrictive requirement and one needs to step outside the class. Here we introduce the notion of emph{measure distortion} and develop a framework for analyzing arbitrary focused stochastic local search algorithms, recovering LLL algorithmizations as the special case of no distortion. Our framework takes as input an arbitrary such algorithm and an arbitrary probability measure and shows how to use the measure as a yardstick of algorithmic progress, even for algorithms designed independently of the measure.
We develop tools for analyzing focused stochastic local search algorithms. These are algorithms which search a state space probabilistically by repeatedly selecting a constraint that is violated in the current state and moving to a random nearby state which, hopefully, addresses the violation without introducing many new ones. A large class of such algorithms arise from the algorithmization of the Lovasz Local Lemma, a non-constructive tool for proving the existence of satisfying states. Here we give tools that provide a unified analysis of such algorithms and of many more, expressing them as instances of a general framework.
Let $Phi = (V, mathcal{C})$ be a constraint satisfaction problem on variables $v_1,dots, v_n$ such that each constraint depends on at most $k$ variables and such that each variable assumes values in an alphabet of size at most $[q]$. Suppose that each constraint shares variables with at most $Delta$ constraints and that each constraint is violated with probability at most $p$ (under the product measure on its variables). We show that for $k, q = O(1)$, there is a deterministic, polynomial time algorithm to approximately count the number of satisfying assignments and a randomized, polynomial time algorithm to sample from approximately the uniform distribution on satisfying assignments, provided that [Ccdot q^{3}cdot k cdot p cdot Delta^{7} < 1, quad text{where }C text{ is an absolute constant.}] Previously, a result of this form was known essentially only in the special case when each constraint is violated by exactly one assignment to its variables. For the special case of $k$-CNF formulas, the term $Delta^{7}$ improves the previously best known $Delta^{60}$ for deterministic algorithms [Moitra, J.ACM, 2019] and $Delta^{13}$ for randomized algorithms [Feng et al., arXiv, 2020]. For the special case of properly $q$-coloring $k$-uniform hypergraphs, the term $Delta^{7}$ improves the previously best known $Delta^{14}$ for deterministic algorithms [Guo et al., SICOMP, 2019] and $Delta^{9}$ for randomized algorithms [Feng et al., arXiv, 2020].
We consider an agent trying to bring a system to an acceptable state by repeated probabilistic action. Several recent works on algorithmizations of the Lovasz Local Lemma (LLL) can be seen as establishing sufficient conditions for the agent to succeed. Here we study whether such stochastic control is also possible in a noisy environment, where both the process of state-observation and the process of state-evolution are subject to adversarial perturbation (noise). The introduction of noise causes the tools developed for LLL algorithmization to break down since the key LLL ingredient, the sparsity of the causality (dependence) relationship, no longer holds. To overcome this challenge we develop a new analysis where entropy plays a central role, both to measure the rate at which progress towards an acceptable state is made and the rate at which noise undoes this progress. The end result is a sufficient condition that allows a smooth tradeoff between the intensity of the noise and the amenability of the system, recovering an asymmetric LLL condition in the noiseless case.
We study the problem of sampling an approximately uniformly random satisfying assignment for atomic constraint satisfaction problems i.e. where each constraint is violated by only one assignment to its variables. Let $p$ denote the maximum probability of violation of any constraint and let $Delta$ denote the maximum degree of the line graph of the constraints. Our main result is a nearly-linear (in the number of variables) time algorithm for this problem, which is valid in a Lovasz local lemma type regime that is considerably less restrictive compared to previous works. In particular, we provide sampling algorithms for the uniform distribution on: (1) $q$-colorings of $k$-uniform hypergraphs with $Delta lesssim q^{(k-4)/3 + o_{q}(1)}.$ The exponent $1/3$ improves the previously best-known $1/7$ in the case $q, Delta = O(1)$ [Jain, Pham, Vuong; arXiv, 2020] and $1/9$ in the general case [Feng, He, Yin; STOC 2021]. (2) Satisfying assignments of Boolean $k$-CNF formulas with $Delta lesssim 2^{k/5.741}.$ The constant $5.741$ in the exponent improves the previously best-known $7$ in the case $k = O(1)$ [Jain, Pham, Vuong; arXiv, 2020] and $13$ in the general case [Feng, He, Yin; STOC 2021]. (3) Satisfying assignments of general atomic constraint satisfaction problems with $pcdot Delta^{7.043} lesssim 1.$ The constant $7.043$ improves upon the previously best-known constant of $350$ [Feng, He, Yin; STOC 2021]. At the heart of our analysis is a novel information-percolation type argument for showing the rapid mixing of the Glauber dynamics for a carefully constructed projection of the uniform distribution on satisfying assignments. Notably, there is no natural partial order on the space, and we believe that the techniques developed for the analysis may be of independent interest.
Genome assembly is a fundamental problem in Bioinformatics, requiring to reconstruct a source genome from an assembly graph built from a set of reads (short strings sequenced from the genome). A notion of genome assembly solution is that of an arc-covering walk of the graph. Since assembly graphs admit many solutions, the goal is to find what is definitely present in all solutions, or what is safe. Most practical assemblers are based on heuristics having at their core unitigs, namely paths whose internal nodes have unit in-degree and out-degree, and which are clearly safe. The long-standing open problem of finding all the safe parts of the solutions was recently solved by a major theoretical result [RECOMB16]. This safe and complete genome assembly algorithm was followed by other works improving the time bounds, as well as extending the results for different notions of assembly solution. But it remained open whether one can be complete also for models of genome assembly of practical applicability. In this paper we present a universal framework for obtaining safe and complete algorithms which unify the previous results, while also allowing for easy generalisations to assembly problems including many practical aspects. This is based on a novel graph structure, called the hydrostructure of a walk, which highlights the reachability properties of the graph from the perspective of the walk. The hydrostructure allows for simple characterisations of the existing safe walks, and of their new practic