No Arabic abstract
Recently Ermon et al. (2013) pioneered a way to practically compute approximations to large scale counting or discrete integration problems by using random hashes. The hashes are used to reduce the counting problem into many separate discrete optimization problems. The optimization problems then can be solved by an NP-oracle such as commercial SAT solvers or integer linear programming (ILP) solvers. In particular, Ermon et al. showed that if the domain of integration is ${0,1}^n$ then it is possible to obtain a solution within a factor of $16$ of the optimal (a 16-approximation) by this technique. In many crucial counting tasks, such as computation of partition function of ferromagnetic Potts model, the domain of integration is naturally ${0,1,dots, q-1}^n, q>2$, the hypergrid. The straightforward extension of Ermon et al.s method allows a $q^2$-approximation for this problem. For large values of $q$, this is undesirable. In this paper, we show an improved technique to obtain an approximation factor of $4+O(1/q^2)$ to this problem. We are able to achieve this by using an idea of optimization over multiple bins of the hash functions, that can be easily implemented by inequality constraints, or even in unconstrained way. Also the burden on the NP-oracle is not increased by our method (an ILP solver can still be used). We provide experimental simulation results to support the theoretical guarantees of our algorithms.
Cut problems form one of the most fundamental classes of problems in algorithmic graph theory. For instance, the minimum cut, the minimum $s$-$t$ cut, the minimum multiway cut, and the minimum $k$-way cut are some of the commonly encountered cut problems. Many of these problems have been extensively studied over several decades. In this paper, we initiate the algorithmic study of some cut problems in high dimensions. The first problem we study, namely, Topological Hitting Set (THS), is defined as follows: Given a nontrivial $r$-cycle $zeta$ in a simplicial complex $mathsf{K}$, find a set $mathcal{S}$ of $r$-dimensional simplices of minimum cardinality so that $mathcal{S}$ meets every cycle homologous to $zeta$. Our main result is that this problem admits a polynomial-time solution on triangulations of closed surfaces. Interestingly, the optimal solution is given in terms of the cocycles of the surface. For general complexes, we show that THS is W[1]-hard with respect to the solution size $k$. On the positive side, we show that THS admits an FPT algorithm with respect to $k+d$, where $d$ is the maximum degree of the Hasse graph of the complex $mathsf{K}$. We also define a problem called Boundary Nontrivialization (BNT): Given a bounding $r$-cycle $zeta$ in a simplicial complex $mathsf{K}$, find a set $mathcal{S}$ of $(r+1)$-dimensional simplices of minimum cardinality so that the removal of $mathcal{S}$ from $mathsf{K}$ makes $zeta$ non-bounding. We show that BNT is W[1]-hard with respect to the solution size as the parameter, and has an $O(log n)$-approximation FPT algorithm for $(r+1)$-dimensional complexes with the $(r+1)$-th Betti number $beta_{r+1}$ as the parameter. Finally, we provide randomized (approximation) FPT algorithms for the global variants of THS and BNT.
Learning in the presence of outliers is a fundamental problem in statistics. Until recently, all known efficient unsupervised learning algorithms were very sensitive to outliers in high dimensions. In particular, even for the task of robust mean estimation under natural distributional assumptions, no efficient algorithm was known. Recent work in theoretical computer science gave the first efficient robust estimators for a number of fundamental statistical tasks, including mean and covariance estimation. Since then, there has been a flurry of research activity on algorithmic high-dimensional robust estimation in a range of settings. In this survey article, we introduce the core ideas and algorithmic techniques in the emerging area of algorithmic high-dimensional robust statistics with a focus on robust mean estimation. We also provide an overview of the approaches that have led to computationally efficient robust estimators for a range of broader statistical tasks and discuss new directions and opportunities for future work.
We consider the problem of approximately solving constraint satisfaction problems with arity $k > 2$ ($k$-CSPs) on instances satisfying certain expansion properties, when viewed as hypergraphs. Random instances of $k$-CSPs, which are also highly expanding, are well-known to be hard to approximate using known algorithmic techniques (and are widely believed to be hard to approximate in polynomial time). However, we show that this is not necessarily the case for instances where the hypergraph is a high-dimensional expander. We consider the spectral definition of high-dimensional expansion used by Dinur and Kaufman [FOCS 2017] to construct certain primitives related to PCPs. They measure the expansion in terms of a parameter $gamma$ which is the analogue of the second singular value for expanding graphs. Extending the results by Barak, Raghavendra and Steurer [FOCS 2011] for 2-CSPs, we show that if an instance of MAX k-CSP over alphabet $[q]$ is a high-dimensional expander with parameter $gamma$, then it is possible to approximate the maximum fraction of satisfiable constraints up to an additive error $epsilon$ using $q^{O(k)} cdot (k/epsilon)^{O(1)}$ levels of the sum-of-squares SDP hierarchy, provided $gamma leq epsilon^{O(1)} cdot (1/(kq))^{O(k)}$. Based on our analysis, we also suggest a notion of threshold-rank for hypergraphs, which can be used to extend the results for approximating 2-CSPs on low threshold-rank graphs. We show that if an instance of MAX k-CSP has threshold rank $r$ for a threshold $tau = (epsilon/k)^{O(1)} cdot (1/q)^{O(k)}$, then it is possible to approximately solve the instance up to additive error $epsilon$, using $r cdot q^{O(k)} cdot (k/epsilon)^{O(1)}$ levels of the sum-of-squares hierarchy. As in the case of graphs, high-dimensional expanders (with sufficiently small $gamma$) have threshold rank 1 according to our definition.
Estimation is the computational task of recovering a hidden parameter $x$ associated with a distribution $D_x$, given a measurement $y$ sampled from the distribution. High dimensional estimation problems arise naturally in statistics, machine learning, and complexity theory. Many high dimensional estimation problems can be formulated as systems of polynomial equations and inequalities, and thus give rise to natural probability distributions over polynomial systems. Sum-of-squares proofs provide a powerful framework to reason about polynomial systems, and further there exist efficient algorithms to search for low-degree sum-of-squares proofs. Understanding and characterizing the power of sum-of-squares proofs for estimation problems has been a subject of intense study in recent years. On one hand, there is a growing body of work utilizing sum-of-squares proofs for recovering solutions to polynomial systems when the system is feasible. On the other hand, a general technique referred to as pseudocalibration has been developed towards showing lower bounds on the degree of sum-of-squares proofs. Finally, the existence of sum-of-squares refutations of a polynomial system has been shown to be intimately connected to the existence of spectral algorithms. In this article we survey these developments.
There has been a resurgence of interest in lower bounds whose truth rests on the conjectured hardness of well known computational problems. These conditional lower bounds have become important and popular due to the painfully slow progress on proving strong unconditional lower bounds. Nevertheless, the long term goal is to replace these conditional bounds with unconditional ones. In this paper we make progress in this direction by studying the cell probe complexity of two conjectured to be hard problems of particular importance: matrix-vector multiplication and a version of dynamic set disjointness known as Patrascus Multiphase Problem. We give improved unconditional lower bounds for these problems as well as introducing new proof techniques of independent interest. These include a technique capable of proving strong threshold lower bounds of the following form: If we insist on having a very fast query time, then the update time has to be slow enough to compute a lookup table with the answer to every possible query. This is the first time a lower bound of this type has been proven.