No Arabic abstract
In the subgraph counting problem, we are given a input graph $G(V, E)$ and a target graph $H$; the goal is to estimate the number of occurrences of $H$ in $G$. Our focus here is on designing sublinear-time algorithms for approximately counting occurrences of $H$ in $G$ in the setting where the algorithm is given query access to $G$. This problem has been studied in several recent papers which primarily focused on specific families of graphs $H$ such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs $H$. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph $H$ in a graph $G$ with $m$ edges is $O(m^{rho(H)})$, where $rho(H)$ is the fractional edge-cover of $H$, and enumeration algorithms with matching runtime are known for any $H$. We bridge this gap between subgraph counting and subgraph enumeration by designing a sublinear-time algorithm that can estimate the number of any arbitrary subgraph $H$ in $G$, denoted by $#H$, to within a $(1pm epsilon)$-approximation w.h.p. in $O(frac{m^{rho(H)}}{#H}) cdot poly(log{n},1/epsilon)$ time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et.al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph $H$ under the additional assumption of edge-sample queries. We further show that our algorithm works for the more general database join size estimation problem and prove a matching lower bound for this problem.
We consider the problem of sampling and approximately counting an arbitrary given motif $H$ in a graph $G$, where access to $G$ is given via queries: degree, neighbor, and pair, as well as uniform edge sample queries. Previous algorithms for these tasks were based on a decomposition of $H$ into a collection of odd cycles and stars, denoted $mathcal{D}^*(H)={O_{k_1}, ldots, O_{k_q}, S_{p_1}, ldots, S_{p_ell}}$. These algorithms were shown to be optimal for the case where $H$ is a clique or an odd-length cycle, but no other lower bounds were known. We present a new algorithm for sampling and approximately counting arbitrary motifs which, up to $textrm{poly}(log n)$ factors, is always at least as good as previous results, and for most graphs $G$ is strictly better. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the $p$-th moment of the degree distribution. Finally, we prove that this algorithm is emph{decomposition-optimal} for decompositions that contain at least one odd cycle. These are the first lower bounds for motifs $H$ with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition.
The problem of sparsifying a graph or a hypergraph while approximately preserving its cut structure has been extensively studied and has many applications. In a seminal work, Benczur and Karger (1996) showed that given any $n$-vertex undirected weighted graph $G$ and a parameter $varepsilon in (0,1)$, there is a near-linear time algorithm that outputs a weighted subgraph $G$ of $G$ of size $tilde{O}(n/varepsilon^2)$ such that the weight of every cut in $G$ is preserved to within a $(1 pm varepsilon)$-factor in $G$. The graph $G$ is referred to as a {em $(1 pm varepsilon)$-approximate cut sparsifier} of $G$. Subsequent recent work has obtained a similar result for the more general problem of hypergraph cut sparsifiers. However, all known sparsification algorithms require $Omega(n + m)$ time where $n$ denotes the number of vertices and $m$ denotes the number of hyperedges in the hypergraph. Since $m$ can be exponentially large in $n$, a natural question is if it is possible to create a hypergraph cut sparsifier in time polynomial in $n$, {em independent of the number of edges}. We resolve this question in the affirmative, giving the first sublinear time algorithm for this problem, given appropriate query access to the hypergraph.
We study the problem of approximating the eigenspectrum of a symmetric matrix $A in mathbb{R}^{n times n}$ with bounded entries (i.e., $|A|_{infty} leq 1$). We present a simple sublinear time algorithm that approximates all eigenvalues of $A$ up to additive error $pm epsilon n$ using those of a randomly sampled $tilde{O}(frac{1}{epsilon^4}) times tilde O(frac{1}{epsilon^4})$ principal submatrix. Our result can be viewed as a concentration bound on the full eigenspectrum of a random principal submatrix. It significantly extends existing work which shows concentration of just the spectral norm [Tro08]. It also extends work on sublinear time algorithms for testing the presence of large negative eigenvalues in the spectrum [BCJ20]. To complement our theoretical results, we provide numerical simulations, which demonstrate the effectiveness of our algorithm in approximating the eigenvalues of a wide range of matrices.
We study the problem of estimating the edit distance between two $n$-character strings. While exact computation in the worst case is believed to require near-quadratic time, previous work showed that in certain regimes it is possible to solve the following {em gap edit distance} problem in sub-linear time: distinguish between inputs of distance $le k$ and $>k^2$. Our main result is a very simple algorithm for this benchmark that runs in time $tilde O(n/sqrt{k})$, and in particular settles the open problem of obtaining a truly sublinear time for the entire range of relevant $k$. Building on the same framework, we also obtain a $k$-vs-$k^2$ algorithm for the one-sided preprocessing model with $tilde O(n)$ preprocessing time and $tilde O(n/k)$ query time (improving over a recent $tilde O(n/k+k^2)$-query time algorithm for the same problem [GRS20].
In this work, we consider the problem of sampling a $k$-clique in a graph from an almost uniform distribution in sublinear time in the general graph query model. Specifically the algorithm should output each $k$-clique with probability $(1pm epsilon)/n_k$, where $n_k$ denotes the number of $k$-cliques in the graph and $epsilon$ is a given approximation parameter. We prove that the query complexity of this problem is [ Theta^*left(maxleft{ left(frac{(nalpha)^{k/2}}{ n_k}right)^{frac{1}{k-1}} ,; minleft{nalpha,frac{nalpha^{k-1}}{n_k} right}right}right). ] where $n$ is the number of vertices in the graph, $alpha$ is its arboricity, and $Theta^*$ suppresses the dependence on $(log n/epsilon)^{O(k)}$. Interestingly, this establishes a separation between approximate counting and approximate uniform sampling in the sublinear regime. For example, if $k=3$, $alpha = O(1)$, and $n_3$ (the number of triangles) is $Theta(n)$, then we get a lower bound of $Omega(n^{1/4})$ (for constant $epsilon$), while under these conditions, a $(1pm epsilon)$-approximation of $n_3$ can be obtained by performing $textrm{poly}(log(n/epsilon))$ queries (Eden, Ron and Seshadhri, SODA20). Our lower bound follows from a construction of a family of graphs with arboricity $alpha$ such that in each graph there are $n_k$ cliques (of size $k$), where one of these cliques is hidden and hence hard to sample. Our upper bound is based on defining a special auxiliary graph $H_k$, such that sampling edges almost uniformly in $H_k$ translates to sampling $k$-cliques almost uniformly in the original graph $G$. We then build on a known edge-sampling algorithm (Eden, Ron and Rosenbaum, ICALP19) to sample edges in $H_k$, where the challenge is simulate queries to $H_k$ while being given access only to $G$.