Almost Optimal Bounds for Sublinear-Time Sampling of $k$-Cliques: Sampling Cliques is Harder Than Counting

101 0 0.0 ( 0 )

Download Cite

Added by Talya Eden

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Talya Eden - Dana Ron - Will Rosenbaum

Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this work, we consider the problem of sampling a $k$-clique in a graph from an almost uniform distribution in sublinear time in the general graph query model. Specifically the algorithm should output each $k$-clique with probability $(1pm epsilon)/n_k$, where $n_k$ denotes the number of $k$-cliques in the graph and $epsilon$ is a given approximation parameter. We prove that the query complexity of this problem is [ Theta^left(maxleft{ left(frac{(nalpha)^{k/2}}{ n_k}right)^{frac{1}{k-1}} ,; minleft{nalpha,frac{nalpha^{k-1}}{n_k} right}right}right). ] where $n$ is the number of vertices in the graph, $alpha$ is its arboricity, and $Theta^$ suppresses the dependence on $(log n/epsilon)^{O(k)}$. Interestingly, this establishes a separation between approximate counting and approximate uniform sampling in the sublinear regime. For example, if $k=3$, $alpha = O(1)$, and $n_3$ (the number of triangles) is $Theta(n)$, then we get a lower bound of $Omega(n^{1/4})$ (for constant $epsilon$), while under these conditions, a $(1pm epsilon)$-approximation of $n_3$ can be obtained by performing $textrm{poly}(log(n/epsilon))$ queries (Eden, Ron and Seshadhri, SODA20). Our lower bound follows from a construction of a family of graphs with arboricity $alpha$ such that in each graph there are $n_k$ cliques (of size $k$), where one of these cliques is hidden and hence hard to sample. Our upper bound is based on defining a special auxiliary graph $H_k$, such that sampling edges almost uniformly in $H_k$ translates to sampling $k$-cliques almost uniformly in the original graph $G$. We then build on a known edge-sampling algorithm (Eden, Ron and Rosenbaum, ICALP19) to sample edges in $H_k$, where the challenge is simulate queries to $H_k$ while being given access only to $G$.

rate research

Towards a Decomposition-Optimal Algorithm for Counting and Sampling Arbitrary Motifs in Sublinear Time

115 - Amartya Shankha Biswas , Talya Eden , Ronitt Rubinfeld 2021

We consider the problem of sampling and approximately counting an arbitrary given motif $H$ in a graph $G$, where access to $G$ is given via queries: degree, neighbor, and pair, as well as uniform edge sample queries. Previous algorithms for these tasks were based on a decomposition of $H$ into a collection of odd cycles and stars, denoted $mathcal{D}^*(H)={O_{k_1}, ldots, O_{k_q}, S_{p_1}, ldots, S_{p_ell}}$. These algorithms were shown to be optimal for the case where $H$ is a clique or an odd-length cycle, but no other lower bounds were known. We present a new algorithm for sampling and approximately counting arbitrary motifs which, up to $textrm{poly}(log n)$ factors, is always at least as good as previous results, and for most graphs $G$ is strictly better. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the $p$-th moment of the degree distribution. Finally, we prove that this algorithm is emph{decomposition-optimal} for decompositions that contain at least one odd cycle. These are the first lower bounds for motifs $H$ with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition.

Data Structures and Algorithms

A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling

235 - Sepehr Assadi , Michael Kapralov , Sanjeev Khanna 2018

In the subgraph counting problem, we are given a input graph $G(V, E)$ and a target graph $H$; the goal is to estimate the number of occurrences of $H$ in $G$. Our focus here is on designing sublinear-time algorithms for approximately counting occurrences of $H$ in $G$ in the setting where the algorithm is given query access to $G$. This problem has been studied in several recent papers which primarily focused on specific families of graphs $H$ such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs $H$. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph $H$ in a graph $G$ with $m$ edges is $O(m^{rho(H)})$, where $rho(H)$ is the fractional edge-cover of $H$, and enumeration algorithms with matching runtime are known for any $H$. We bridge this gap between subgraph counting and subgraph enumeration by designing a sublinear-time algorithm that can estimate the number of any arbitrary subgraph $H$ in $G$, denoted by $#H$, to within a $(1pm epsilon)$-approximation w.h.p. in $O(frac{m^{rho(H)}}{#H}) cdot poly(log{n},1/epsilon)$ time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et.al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph $H$ under the additional assumption of edge-sample queries. We further show that our algorithm works for the more general database join size estimation problem and prove a matching lower bound for this problem.

Data Structures and Algorithms

Sublinear Time Eigenvalue Approximation via Random Sampling

267 - Rajarshi Bhattacharjee , Cameron Musco , Archan Ray 2021

We study the problem of approximating the eigenspectrum of a symmetric matrix $A in mathbb{R}^{n times n}$ with bounded entries (i.e., $|A|_{infty} leq 1$). We present a simple sublinear time algorithm that approximates all eigenvalues of $A$ up to additive error $pm epsilon n$ using those of a randomly sampled $tilde{O}(frac{1}{epsilon^4}) times tilde O(frac{1}{epsilon^4})$ principal submatrix. Our result can be viewed as a concentration bound on the full eigenspectrum of a random principal submatrix. It significantly extends existing work which shows concentration of just the spectral norm [Tro08]. It also extends work on sublinear time algorithms for testing the presence of large negative eigenvalues in the spectrum [BCJ20]. To complement our theoretical results, we provide numerical simulations, which demonstrate the effectiveness of our algorithm in approximating the eigenvalues of a wide range of matrices.

Data Structures and Algorithms Numerical Analysis Numerical Analysis

Improved Bounds for Perfect Sampling of $k$-Colorings in Graphs

82 - Siddharth Bhandari , Sayantan Chakraborty 2019

We present a randomized algorithm that takes as input an undirected $n$-vertex graph $G$ with maximum degree $Delta$ and an integer $k > 3Delta$, and returns a random proper $k$-coloring of $G$. The distribution of the coloring is emph{perfectly} uniform over the set of all proper $k$-colorings; the expected running time of the algorithm is $mathrm{poly}(k,n)=widetilde{O}(nDelta^2cdot log(k))$. This improves upon a result of Huber~(STOC 1998) who obtained a polynomial time perfect sampling algorithm for $k>Delta^2+2Delta$. Prior to our work, no algorithm with expected running time $mathrm{poly}(k,n)$ was known to guarantee perfectly sampling with sub-quadratic number of colors in general. Our algorithm (like several other perfect sampling algorithms including Hubers) is based on the Coupling from the Past method. Inspired by the emph{bounding chain} approach, pioneered independently by Huber~(STOC 1998) and Haggstrom & Nelander~(Scand.{} J.{} Statist., 1999), we employ a novel bounding chain to derive our result for the graph coloring problem.

Data Structures and Algorithms Discrete Mathematics

Sublinear Time Hypergraph Sparsification via Cut and Edge Sampling Queries

215 - Yu Chen , Sanjeev Khanna , Ansh Nagda 2021

The problem of sparsifying a graph or a hypergraph while approximately preserving its cut structure has been extensively studied and has many applications. In a seminal work, Benczur and Karger (1996) showed that given any $n$-vertex undirected weighted graph $G$ and a parameter $varepsilon in (0,1)$, there is a near-linear time algorithm that outputs a weighted subgraph $G$ of $G$ of size $tilde{O}(n/varepsilon^2)$ such that the weight of every cut in $G$ is preserved to within a $(1 pm varepsilon)$-factor in $G$. The graph $G$ is referred to as a {em $(1 pm varepsilon)$-approximate cut sparsifier} of $G$. Subsequent recent work has obtained a similar result for the more general problem of hypergraph cut sparsifiers. However, all known sparsification algorithms require $Omega(n + m)$ time where $n$ denotes the number of vertices and $m$ denotes the number of hyperedges in the hypergraph. Since $m$ can be exponentially large in $n$, a natural question is if it is possible to create a hypergraph cut sparsifier in time polynomial in $n$, {em independent of the number of edges}. We resolve this question in the affirmative, giving the first sublinear time algorithm for this problem, given appropriate query access to the hypergraph.

Data Structures and Algorithms