No Arabic abstract
Say that we are given samples from a distribution $psi$ over an $n$-dimensional space. We expect or desire $psi$ to behave like a product distribution (or a $k$-wise independent distribution over its marginals for small $k$). We propose the problem of enumerating/list-decoding all large subcubes where the distribution $psi$ deviates markedly from what we expect; we refer to such subcubes as skewed subcubes. Skewed subcubes are certificates of dependencies between small subsets of variables in $psi$. We motivate this problem by showing that it arises naturally in the context of algorithmic fairness and anomaly detection. In this work we focus on the special but important case where the space is the Boolean hypercube, and the expected marginals are uniform. We show that the obvious definition of skewed subcubes can lead to intractable list sizes, and propose a better definition of a minimal skewed subcube, which are subcubes whose skew cannot be attributed to a larger subcube that contains it. Our main technical contribution is a list-size bound for this definition and an algorithm to efficiently find all such subcubes. Both the bound and the algorithm rely on Fourier-analytic techniques, especially the powerful hypercontractive inequality. On the lower bounds side, we show that finding skewed subcubes is as hard as the sparse noisy parity problem, and hence our algorithms cannot be improved on substantially without a breakthrough on this problem which is believed to be intractable. Motivated by this, we study alternate models allowing query access to $psi$ where finding skewed subcubes might be easier.
Understanding spatial correlation is vital in many fields including epidemiology and social science. Lee, Meeks and Pettersson (Stat. Comput. 2021) recently demonstrated that improved inference for areal unit count data can be achieved by carrying out modifications to a graph representing spatial correlations; specifically, they delete edges of the planar graph derived from border-sharing between geographic regions in order to maximise a specific objective function. In this paper we address the computational complexity of the associated graph optimisation problem. We demonstrate that this problem cannot be solved in polynomial time unless P = NP; we further show intractability for two simpler variants of the problem. We follow these results with two parameterised algorithms that exactly solve the problem in polynomial time in restricted settings. The first of these utilises dynamic programming on a tree decomposition, and runs in polynomial time if both the treewidth and maximum degree are bounded. The second algorithm is restricted to problem instances with maximum degree three, as may arise from triangulations of planar surfaces, but is an FPT algorithm when the maximum number of edges that can be removed is taken as the parameter.
For two positive integers $k$ and $ell$, a $(k times ell)$-spindle is the union of $k$ pairwise internally vertex-disjoint directed paths with $ell$ arcs between two vertices $u$ and $v$. We are interested in the (parameterized) complexity of several problems consisting in deciding whether a given digraph contains a subdivision of a spindle, which generalize both the Maximum Flow and Longest Path problems. We obtain the following complexity dichotomy: for a fixed $ell geq 1$, finding the largest $k$ such that an input digraph $G$ contains a subdivision of a $(k times ell)$-spindle is polynomial-time solvable if $ell leq 3$, and NP-hard otherwise. We place special emphasis on finding spindles with exactly two paths and present FPT algorithms that are asymptotically optimal under the ETH. These algorithms are based on the technique of representative families in matroids, and use also color-coding as a subroutine. Finally, we study the case where the input graph is acyclic, and present several algorithmic and hardness results.
We investigate the parameterized complexity of finding subgraphs with hereditary properties on graphs belonging to a hereditary graph class. Given a graph $G$, a non-trivial hereditary property $Pi$ and an integer parameter $k$, the general problem $P(G,Pi,k)$ asks whether there exists $k$ vertices of $G$ that induce a subgraph satisfying property $Pi$. This problem, $P(G,Pi,k)$ has been proved to be NP-complete by Lewis and Yannakakis. The parameterized complexity of this problem is shown to be W[1]-complete by Khot and Raman, if $Pi$ includes all trivial graphs but not all complete graphs and vice versa; and is fixed-parameter tractable (FPT), otherwise. As the problem is W[1]-complete on general graphs when $Pi$ includes all trivial graphs but not all complete graphs and vice versa, it is natural to further investigate the problem on restricted graph classes. Motivated by this line of research, we study the problem on graphs which also belong to a hereditary graph class and establish a framework which settles the parameterized complexity of the problem for various hereditary graph classes. In particular, we show that: $P(G,Pi,k)$ is solvable in polynomial time when the graph $G$ is co-bipartite and $Pi$ is the property of being planar, bipartite or triangle-free (or vice-versa). $P(G,Pi,k)$ is FPT when the graph $G$ is planar, bipartite or triangle-free and $Pi$ is the property of being planar, bipartite or triangle-free, or graph $G$ is co-bipartite and $Pi$ is the property of being co-bipartite. $P(G,Pi,k)$ is W[1]-complete when the graph $G$ is $C_4$-free, $K_{1,4}$-free or a unit disk graph and $Pi$ is the property of being either planar or bipartite.
We prove new complexity results for Feedback Vertex Set and Even Cycle Transversal on $H$-free graphs, that is, graphs that do not contain some fixed graph $H$ as an induced subgraph. In particular, we prove that both problems are polynomial-time solvable for $sP_3$-free graphs for every integer $sgeq 1$. Our results show that both problems exhibit the same behaviour on $H$-free graphs (subject to some open cases). This is in part explained by a new general algorithm we design for finding in a graph $G$ a largest induced subgraph whose blocks belong to some finite class ${cal C}$ of graphs. We also compare our results with the state-of-the-art results for the Odd Cycle Transversal problem, which is known to behave differently on $H$-free graphs.
begin{abstract} The frequencies of the elements in a data stream are an important statistical measure and the task of estimating them arises in many applications within data analysis and machine learning. Two of the most popular algorithms for this problem, Count-Min and Count-Sketch, are widely used in practice. In a recent work [Hsu et al., ICLR19], it was shown empirically that augmenting Count-Min and Count-Sketch with a machine learning algorithm leads to a significant reduction of the estimation error. The experiments were complemented with an analysis of the expected error incurred by Count-Min (both the standard and the augmented version) when the input frequencies follow a Zipfian distribution. Although the authors established that the learned version of Count-Min has lower estimation error than its standard counterpart, their analysis of the standard Count-Min algorithm was not tight. Moreover, they provided no similar analysis for Count-Sketch. In this paper we resolve these problems. First, we provide a simple tight analysis of the expected error incurred by Count-Min. Second, we provide the first error bounds for both the standard and the augmented version of Count-Sketch. These bounds are nearly tight and again demonstrate an improved performance of the learned version of Count-Sketch. In addition to demonstrating tight gaps between the aforementioned algorithms, we believe that our bounds for the standa