No Arabic abstract
We prove essentially tight lower bounds, conditionally to the Exponential Time Hypothesis, for two fundamental but seemingly very different cutting problems on surface-embedded graphs: the Shortest Cut Graph problem and the Multiway Cut problem. A cut graph of a graph $G$ embedded on a surface $S$ is a subgraph of $G$ whose removal from $S$ leaves a disk. We consider the problem of deciding whether an unweighted graph embedded on a surface of genus $g$ has a cut graph of length at most a given value. We prove a time lower bound for this problem of $n^{Omega(g/log g)}$ conditionally to ETH. In other words, the first $n^{O(g)}$-time algorithm by Erickson and Har-Peled [SoCG 2002, Discr. Comput. Geom. 2004] is essentially optimal. We also prove that the problem is W[1]-hard when parameterized by the genus, answering a 17-year old question of these authors. A multiway cut of an undirected graph $G$ with $t$ distinguished vertices, called terminals, is a set of edges whose removal disconnects all pairs of terminals. We consider the problem of deciding whether an unweighted graph $G$ has a multiway cut of weight at most a given value. We prove a time lower bound for this problem of $n^{Omega(sqrt{gt + g^2+t}/log(g+t))}$, conditionally to ETH, for any choice of the genus $gge0$ of the graph and the number of terminals $tge4$. In other words, the algorithm by the second author [Algorithmica 2017] (for the more general multicut problem) is essentially optimal; this extends the lower bound by the third author [ICALP 2012] (for the planar case). Reductions to planar problems usually involve a grid-like structure. The main novel idea for our results is to understand what structures instead of grids are needed if we want to exploit optimally a certain value $g$ of the genus.
Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorithms, both across the memory hierarchy and between processors. Current approaches either study specific algorithms individually, disallow programmatic motifs such as recomputation, or produce asymptotic bounds that exclude important constants. We propose a novel approach for obtaining precise I/O lower bounds on a general class of programs, which we call Simple Overlap Access Programs (SOAP). SOAP analysis covers a wide variety of algorithms, from ubiquitous computational kernels to full scientific computing applications. Using the red-blue pebble game and combinatorial methods, we are able to bound the I/O of the SOAP-induced Computational Directed Acyclic Graph (CDAG), taking into account multiple statements, input/output reuse, and optimal tiling. To deal with programs that are outside of our representation (e.g., non-injective access functions), we describe methods to approximate them with SOAP. To demonstrate our method, we analyze 38 different applications, including kernels from the Polybench benchmark suite, deep learning operators, and -- for the first time -- applications in unstructured physics simulations, numerical weather prediction stencil compositions, and full deep neural networks. We derive tight I/O bounds for several linear algebra kernels, such as Cholesky decomposition, improving the existing reported bounds by a factor of two. For stencil applications, we improve the existing bounds by a factor of up to 14. We implement our method as an open-source tool, which can derive lower bounds directly from provided C code.
The random k-SAT model is the most important and well-studied distribution over k-SAT instances. It is closely connected to statistical physics; it is used as a testbench for satisfiability algorithms, and average-case hardness over this distribution has also been linked to hardness of approximation via Feiges hypothesis. We prove that any Cutting Planes refutation for random k-SAT requires exponential size, for k that is logarithmic in the number of variables, in the (interesting) regime where the number of clauses guarantees that the formula is unsatisfiable with high probability.
Given a large data matrix $Ainmathbb{R}^{ntimes n}$, we consider the problem of determining whether its entries are i.i.d. with some known marginal distribution $A_{ij}sim P_0$, or instead $A$ contains a principal submatrix $A_{{sf Q},{sf Q}}$ whose entries have marginal distribution $A_{ij}sim P_1 eq P_0$. As a special case, the hidden (or planted) clique problem requires to find a planted clique in an otherwise uniformly random graph. Assuming unbounded computational resources, this hypothesis testing problem is statistically solvable provided $|{sf Q}|ge C log n$ for a suitable constant $C$. However, despite substantial effort, no polynomial time algorithm is known that succeeds with high probability when $|{sf Q}| = o(sqrt{n})$. Recently Meka and Wigderson cite{meka2013association}, proposed a method to establish lower bounds within the Sum of Squares (SOS) semidefinite hierarchy. Here we consider the degree-$4$ SOS relaxation, and study the construction of cite{meka2013association} to prove that SOS fails unless $kge C, n^{1/3}/log n$. An argument presented by Barak implies that this lower bound cannot be substantially improved unless the witness construction is changed in the proof. Our proof uses the moments method to bound the spectrum of a certain random association scheme, i.e. a symmetric random matrix whose rows and columns are indexed by the edges of an Erdos-Renyi random graph.
We demonstrate a lower bound technique for linear decision lists, which are decision lists where the queries are arbitrary linear threshold functions. We use this technique to prove an explicit lower bound by showing that any linear decision list computing the function $MAJ circ XOR$ requires size $2^{0.18 n}$. This completely answers an open question of Tur{a}n and Vatan [FoCM97]. We also show that the spectral classes $PL_1, PL_infty$, and the polynomial threshold function classes $widehat{PT}_1, PT_1$, are incomparable to linear decision lists.
We consider the file maintenance problem (also called the online labeling problem) in which n integer items from the set {1,...,r} are to be stored in an array of size m >= n. The items are presented sequentially in an arbitrary order, and must be stored in the array in sorted order (but not necessarily in consecutive locations in the array). Each new item must be stored in the array before the next item is received. If r<=m then we can simply store item j in location j but if r>m then we may have to shift the location of stored items to make space for a newly arrived item. The algorithm is charged each time an item is stored in the array, or moved to a new location. The goal is to minimize the total number of such moves done by the algorithm. This problem is non-trivial when n=<m<r. In the case that m=Cn for some C>1, algorithms for this problem with cost O(log(n)^2) per item have been given [IKR81, Wil92, BCD+02]. When m=n, algorithms with cost O(log(n)^3) per item were given [Zha93, BS07]. In this paper we prove lower bounds that show that these algorithms are optimal, up to constant factors. Previously, the only lower bound known for this range of parameters was a lower bound of Omega(log(n)^2) for the restricted class of smooth algorithms [DSZ05a, Zha93]. We also provide an algorithm for the sparse case: If the number of items is polylogarithmic in the array size then the problem can be solved in amortized constant time per item.