ترغب بنشر مسار تعليمي؟ اضغط هنا

Sublinear Time Eigenvalue Approximation via Random Sampling

268   0   0.0 ( 0 )
 نشر من قبل Archan Ray
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the problem of approximating the eigenspectrum of a symmetric matrix $A in mathbb{R}^{n times n}$ with bounded entries (i.e., $|A|_{infty} leq 1$). We present a simple sublinear time algorithm that approximates all eigenvalues of $A$ up to additive error $pm epsilon n$ using those of a randomly sampled $tilde{O}(frac{1}{epsilon^4}) times tilde O(frac{1}{epsilon^4})$ principal submatrix. Our result can be viewed as a concentration bound on the full eigenspectrum of a random principal submatrix. It significantly extends existing work which shows concentration of just the spectral norm [Tro08]. It also extends work on sublinear time algorithms for testing the presence of large negative eigenvalues in the spectrum [BCJ20]. To complement our theoretical results, we provide numerical simulations, which demonstrate the effectiveness of our algorithm in approximating the eigenvalues of a wide range of matrices.

قيم البحث

اقرأ أيضاً

We analyze the popular kernel polynomial method (KPM) for approximating the spectral density (eigenvalue distribution) of an $ntimes n$ Hermitian matrix $A$. We prove that a simple and practical variant of the KPM algorithm can approximate the spectr al density to $epsilon$ accuracy in the Wasserstein-1 distance with roughly $O({1}/{epsilon})$ matrix-vector multiplications with $A$. This yields a provable linear time result for the problem with better $epsilon$ dependence than prior work. The KPM variant we study is based on damped Chebyshev polynomial expansions. We show that it is stable, meaning that it can be combined with any approximate matrix-vector multiplication algorithm for $A$. As an application, we develop an $O(ncdot text{poly}(1/epsilon))$ time algorithm for computing the spectral density of any $ntimes n$ normalized graph adjacency or Laplacian matrix. This runtime is sublinear in the size of the matrix, and assumes sample access to the graph. Our approach leverages several tools from approximation theory, including Jacksons seminal work on approximation with positive kernels [Jackson, 1912], and stability properties of three-term recurrence relations for orthogonal polynomials.
The problem of sparsifying a graph or a hypergraph while approximately preserving its cut structure has been extensively studied and has many applications. In a seminal work, Benczur and Karger (1996) showed that given any $n$-vertex undirected weigh ted graph $G$ and a parameter $varepsilon in (0,1)$, there is a near-linear time algorithm that outputs a weighted subgraph $G$ of $G$ of size $tilde{O}(n/varepsilon^2)$ such that the weight of every cut in $G$ is preserved to within a $(1 pm varepsilon)$-factor in $G$. The graph $G$ is referred to as a {em $(1 pm varepsilon)$-approximate cut sparsifier} of $G$. Subsequent recent work has obtained a similar result for the more general problem of hypergraph cut sparsifiers. However, all known sparsification algorithms require $Omega(n + m)$ time where $n$ denotes the number of vertices and $m$ denotes the number of hyperedges in the hypergraph. Since $m$ can be exponentially large in $n$, a natural question is if it is possible to create a hypergraph cut sparsifier in time polynomial in $n$, {em independent of the number of edges}. We resolve this question in the affirmative, giving the first sublinear time algorithm for this problem, given appropriate query access to the hypergraph.
In the subgraph counting problem, we are given a input graph $G(V, E)$ and a target graph $H$; the goal is to estimate the number of occurrences of $H$ in $G$. Our focus here is on designing sublinear-time algorithms for approximately counting occurr ences of $H$ in $G$ in the setting where the algorithm is given query access to $G$. This problem has been studied in several recent papers which primarily focused on specific families of graphs $H$ such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs $H$. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph $H$ in a graph $G$ with $m$ edges is $O(m^{rho(H)})$, where $rho(H)$ is the fractional edge-cover of $H$, and enumeration algorithms with matching runtime are known for any $H$. We bridge this gap between subgraph counting and subgraph enumeration by designing a sublinear-time algorithm that can estimate the number of any arbitrary subgraph $H$ in $G$, denoted by $#H$, to within a $(1pm epsilon)$-approximation w.h.p. in $O(frac{m^{rho(H)}}{#H}) cdot poly(log{n},1/epsilon)$ time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et.al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph $H$ under the additional assumption of edge-sample queries. We further show that our algorithm works for the more general database join size estimation problem and prove a matching lower bound for this problem.
In this work, we consider the problem of sampling a $k$-clique in a graph from an almost uniform distribution in sublinear time in the general graph query model. Specifically the algorithm should output each $k$-clique with probability $(1pm epsilon) /n_k$, where $n_k$ denotes the number of $k$-cliques in the graph and $epsilon$ is a given approximation parameter. We prove that the query complexity of this problem is [ Theta^*left(maxleft{ left(frac{(nalpha)^{k/2}}{ n_k}right)^{frac{1}{k-1}} ,; minleft{nalpha,frac{nalpha^{k-1}}{n_k} right}right}right). ] where $n$ is the number of vertices in the graph, $alpha$ is its arboricity, and $Theta^*$ suppresses the dependence on $(log n/epsilon)^{O(k)}$. Interestingly, this establishes a separation between approximate counting and approximate uniform sampling in the sublinear regime. For example, if $k=3$, $alpha = O(1)$, and $n_3$ (the number of triangles) is $Theta(n)$, then we get a lower bound of $Omega(n^{1/4})$ (for constant $epsilon$), while under these conditions, a $(1pm epsilon)$-approximation of $n_3$ can be obtained by performing $textrm{poly}(log(n/epsilon))$ queries (Eden, Ron and Seshadhri, SODA20). Our lower bound follows from a construction of a family of graphs with arboricity $alpha$ such that in each graph there are $n_k$ cliques (of size $k$), where one of these cliques is hidden and hence hard to sample. Our upper bound is based on defining a special auxiliary graph $H_k$, such that sampling edges almost uniformly in $H_k$ translates to sampling $k$-cliques almost uniformly in the original graph $G$. We then build on a known edge-sampling algorithm (Eden, Ron and Rosenbaum, ICALP19) to sample edges in $H_k$, where the challenge is simulate queries to $H_k$ while being given access only to $G$.
We consider the problem of sampling and approximately counting an arbitrary given motif $H$ in a graph $G$, where access to $G$ is given via queries: degree, neighbor, and pair, as well as uniform edge sample queries. Previous algorithms for these ta sks were based on a decomposition of $H$ into a collection of odd cycles and stars, denoted $mathcal{D}^*(H)={O_{k_1}, ldots, O_{k_q}, S_{p_1}, ldots, S_{p_ell}}$. These algorithms were shown to be optimal for the case where $H$ is a clique or an odd-length cycle, but no other lower bounds were known. We present a new algorithm for sampling and approximately counting arbitrary motifs which, up to $textrm{poly}(log n)$ factors, is always at least as good as previous results, and for most graphs $G$ is strictly better. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the $p$-th moment of the degree distribution. Finally, we prove that this algorithm is emph{decomposition-optimal} for decompositions that contain at least one odd cycle. These are the first lower bounds for motifs $H$ with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا