No Arabic abstract
We study the problem of finding a spanning forest in an undirected, $n$-vertex multi-graph under two basic query models. One is the Linear query model which are linear measurements on the incidence vector induced by the edges; the other is the weaker OR query model which only reveals whether a given subset of plausible edges is empty or not. At the heart of our study lies a fundamental problem which we call the {em single element recovery} problem: given a non-negative real vector $x$ in $N$ dimension, return a single element $x_j > 0$ from the support. Queries can be made in rounds, and our goals is to understand the trade-offs between the query complexity and the rounds of adaptivity needed to solve these problems, for both deterministic and randomized algorithms. These questions have connections and ramifications to multiple areas such as sketching, streaming, graph reconstruction, and compressed sensing. Our main results are: * For the single element recovery problem, it is easy to obtain a deterministic, $r$-round algorithm which makes $(N^{1/r}-1)$-queries per-round. We prove that this is tight: any $r$-round deterministic algorithm must make $geq (N^{1/r} - 1)$ linear queries in some round. In contrast, a $1$-round $O(log^2 N)$-query randomized algorithm which succeeds 99% of the time is known to exist. * We design a deterministic $O(r)$-round, $tilde{O}(n^{1+1/r})$-OR query algorithm for graph connectivity. We complement this with an $tilde{Omega}(n^{1 + 1/r})$-lower bound for any $r$-round deterministic algorithm in the OR-model. * We design a randomized, $2$-round algorithm for the graph connectivity problem which makes $tilde{O}(n)$-OR queries. In contrast, we prove that any $1$-round algorithm (possibly randomized) requires $tilde{Omega}(n^2)$-OR queries.
We study the design of local algorithms for massive graphs. A local algorithm is one that finds a solution containing or near a given vertex without looking at the whole graph. We present a local clustering algorithm. Our algorithm finds a good cluster--a subset of vertices whose internal connections are significantly richer than its external connections--near a given vertex. The running time of our algorithm, when it finds a non-empty local cluster, is nearly linear in the size of the cluster it outputs. Our clustering algorithm could be a useful primitive for handling massive graphs, such as social networks and web-graphs. As an application of this clustering algorithm, we present a partitioning algorithm that finds an approximate sparsest cut with nearly optimal balance. Our algorithm takes time nearly linear in the number edges of the graph. Using the partitioning algorithm of this paper, we have designed a nearly-linear time algorithm for constructing spectral sparsifiers of graphs, which we in turn use in a nearly-linear time algorithm for solving linear systems in symmetric, diagonally-dominant matrices. The linear system solver also leads to a nearly linear-time algorithm for approximating the second-smallest eigenvalue and corresponding eigenvector of the Laplacian matrix of a graph. These other results are presented in two companion papers.
A skew-symmetric graph $(D=(V,A),sigma)$ is a directed graph $D$ with an involution $sigma$ on the set of vertices and arcs. In this paper, we introduce a separation problem, $d$-Skew-Symmetric Multicut, where we are given a skew-symmetric graph $D$, a family of $cal T$ of $d$-sized subsets of vertices and an integer $k$. The objective is to decide if there is a set $Xsubseteq A$ of $k$ arcs such that every set $J$ in the family has a vertex $v$ such that $v$ and $sigma(v)$ are in different connected components of $D=(V,Asetminus (Xcup sigma(X))$. In this paper, we give an algorithm for this problem which runs in time $O((4d)^{k}(m+n+ell))$, where $m$ is the number of arcs in the graph, $n$ the number of vertices and $ell$ the length of the family given in the input. Using our algorithm, we show that Almost 2-SAT has an algorithm with running time $O(4^kk^4ell)$ and we obtain algorithms for {sc Odd Cycle Transversal} and {sc Edge Bipartization} which run in time $O(4^kk^4(m+n))$ and $O(4^kk^5(m+n))$ respectively. This resolves an open problem posed by Reed, Smith and Vetta [Operations Research Letters, 2003] and improves upon the earlier almost linear time algorithm of Kawarabayashi and Reed [SODA, 2010]. We also show that Deletion q-Horn Backdoor Set Detection is a special case of 3-Skew-Symmetric Multicut, giving us an algorithm for Deletion q-Horn Backdoor Set Detection which runs in time $O(12^kk^5ell)$. This gives the first fixed-parameter tractable algorithm for this problem answering a question posed in a paper by a superset of the authors [STACS, 2013]. Using this result, we get an algorithm for Satisfiability which runs in time $O(12^kk^5ell)$ where $k$ is the size of the smallest q-Horn deletion backdoor set, with $ell$ being the length of the input formula.
In a (parameterized) graph edge modification problem, we are given a graph $G$, an integer $k$ and a (usually well-structured) class of graphs $mathcal{G}$, and ask whether it is possible to transform $G$ into a graph $G in mathcal{G}$ by adding and/or removing at most $k$ edges. Parameterized graph edge modification problems received considerable attention in the last decades. In this paper, we focus on finding small kernels for edge modification problems. One of the most studied problems is the Cluster Editing problem, in which the goal is to partition the vertex set into a disjoint union of cliques. Even if this problem admits a $2k$ kernel [Cao, 2012], this kernel does not reduce the size of most instances. Therefore, we explore the question of whether linear kernels are a theoretical limit in edge modification problems, in particular when the target graphs are very structured (such as a partition into cliques for instance). We prove, as far as we know, the first sublinear kernel for an edge modification problem. Namely, we show that Clique + Independent Set Deletion, which is a restriction of Cluster Deletion, admits a kernel of size $O(k/log k)$. We also obtain small kernels for several other edge modification problems. We prove that Split Addition (and the equivalent Split Deletion) admits a linear kernel, improving the existing quadratic kernel of Ghosh et al. [Ghosh et al., 2015]. We complement this result by proving that Trivially Perfect Addition admits a quadratic kernel (improving the cubic kernel of Guo [Guo, 2007]), and finally prove that its triangle-free version (Starforest Deletion) admits a linear kernel, which is optimal under ETH.
Spectral algorithms, such as principal component analysis and spectral clustering, typically require careful data transformations to be effective: upon observing a matrix $A$, one may look at the spectrum of $psi(A)$ for a properly chosen $psi$. The issue is that the spectrum of $A$ might be contaminated by non-informational top eigenvalues, e.g., due to scale` variations in the data, and the application of $psi$ aims to remove these. Designing a good functional $psi$ (and establishing what good means) is often challenging and model dependent. This paper proposes a simple and generic construction for sparse graphs, $$psi(A) = 1((I+A)^r ge1),$$ where $A$ denotes the adjacency matrix and $r$ is an integer (less than the graph diameter). This produces a graph connecting vertices from the original graph that are within distance $r$, and is referred to as graph powering. It is shown that graph powering regularizes the graph and decontaminates its spectrum in the following sense: (i) If the graph is drawn from the sparse ErdH{o}s-Renyi ensemble, which has no spectral gap, it is shown that graph powering produces a `maximal spectral gap, with the latter justified by establishing an Alon-Boppana result for powered graphs; (ii) If the graph is drawn from the sparse SBM, graph powering is shown to achieve the fundamental limit for weak recovery (the KS threshold) similarly to cite{massoulie-STOC}, settling an open problem therein. Further, graph powering is shown to be significantly more robust to tangles and cliques than previous spectral algorithms based on self-avoiding or nonbacktracking walk counts cite{massoulie-STOC,Mossel_SBM2,bordenave,colin3}. This is illustrated on a geometric block model that is dense in cliques.
We study the query complexity of determining if a graph is connected with global queries. The first model we look at is matrix-vector multiplication queries to the adjacency matrix. Here, for an $n$-vertex graph with adjacency matrix $A$, one can query a vector $x in {0,1}^n$ and receive the answer $Ax$. We give a randomized algorithm that can output a spanning forest of a weighted graph with constant probability after $O(log^4(n))$ matrix-vector multiplication queries to the adjacency matrix. This complements a result of Sun et al. (ICALP 2019) that gives a randomized algorithm that can output a spanning forest of a graph after $O(log^4(n))$ matrix-vector multiplication queries to the signed vertex-edge incidence matrix of the graph. As an application, we show that a quantum algorithm can output a spanning forest of an unweighted graph after $O(log^5(n))$ cut queries, improving and simplifying a result of Lee, Santha, and Zhang (SODA 2021), which gave the bound $O(log^8(n))$. In the second part of the paper, we turn to showing lower bounds on the linear query complexity of determining if a graph is connected. If $w$ is the weight vector of a graph (viewed as an $binom{n}{2}$ dimensional vector), in a linear query one can query any vector $z in mathbb{R}^{n choose 2}$ and receive the answer $langle z, wrangle$. We show that a zero-error randomized algorithm must make $Omega(n)$ linear queries in expectation to solve connectivity. As far as we are aware, this is the first lower bound of any kind on the unrestricted linear query complexity of connectivity. We show this lower bound by looking at the linear query emph{certificate complexity} of connectivity, and characterize this certificate complexity in a linear algebraic fashion.