PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

322 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhewei Wei

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhewei Wei - Xiaodong He - Xiaokui Xiao

بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

{it SimRank} is a classic measure of the similarities of nodes in a graph. Given a node $u$ in graph $G =(V, E)$, a {em single-source SimRank query} returns the SimRank similarities $s(u, v)$ between node $u$ and each node $v in V$. This type of queries has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes $n$, which renders them inapplicable for real-time and interactive analysis. { This paper proposes prsim, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. prsim uses an index of size $O(m)$, where $m$ is the number of edges in the graph, and guarantees a query time that depends on the {em reverse PageRank} distribution of the input graph. In particular, we prove that prsim runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph.} Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that prsim outperforms the state of the art in terms of query time, accuracy, index size, and scalability.

قيم البحث

68 - Liren Yu , Jiaming Xu , Xiaojun Lin 2021

This paper studies seeded graph matching for power-law graphs. Assume that two edge-correlated graphs are independently edge-sampled from a common parent graph with a power-law degree distribution. A set of correctly matched vertex-pairs is chosen at random and revealed as initial seeds. Our goal is to use the seeds to recover the remaining latent vertex correspondence between the two graphs. Departing from the existing approaches that focus on the use of high-degree seeds in $1$-hop neighborhoods, we develop an efficient algorithm that exploits the low-degree seeds in suitably-defined $D$-hop neighborhoods. Specifically, we first match a set of vertex-pairs with appropriate degrees (which we refer to as the first slice) based on the number of low-degree seeds in their $D$-hop neighborhoods. This significantly reduces the number of initial seeds needed to trigger a cascading process to match the rest of the graphs. Under the Chung-Lu random graph model with $n$ vertices, max degree $Theta(sqrt{n})$, and the power-law exponent $2<beta<3$, we show that as soon as $D> frac{4-beta}{3-beta}$, by optimally choosing the first slice, with high probability our algorithm can correctly match a constant fraction of the true pairs without any error, provided with only $Omega((log n)^{4-beta})$ initial seeds. Our result achieves an exponential reduction in the seed size requirement, as the best previously known result requires $n^{1/2+epsilon}$ seeds (for any small constant $epsilon>0$). Performance evaluation with synthetic and real data further corroborates the improved performance of our algorithm.

بنى وهياكل البيانات والخوارزميات التعلم الآلي

Distributed Approximation on Power Graphs

117 - Reuven Bar-Yehuda , Keren Censor-Hillel , Yannic Maus 2020

We investigate graph problems in the following setting: we are given a graph $G$ and we are required to solve a problem on $G^2$. While we focus mostly on exploring this theme in the distributed CONGEST model, we show new results and surprising conne ctions to the centralized model of computation. In the CONGEST model, it is natural to expect that problems on $G^2$ would be quite difficult to solve efficiently on $G$, due to congestion. However, we show that the picture is both more complicated and more interesting. Specifically, we encounter two phenomena acting in opposing directions: (i) slowdown due to congestion and (ii) speedup due to structural properties of $G^2$. We demonstrate these two phenomena via two fundamental graph problems, namely, Minimum Vertex Cover (MVC) and Minimum Dominating Set (MDS). Among our many contributions, the highlights are the following. - In the CONGEST model, we show an $O(n/epsilon)$-round $(1+epsilon)$-approximation algorithm for MVC on $G^2$, while no $o(n^2)$-round algorithm is known for any better-than-2 approximation for MVC on $G$. - We show a centralized polynomial time $5/3$-approximation algorithm for MVC on $G^2$, whereas a better-than-2 approximation is UGC-hard for $G$. - In contrast, for MDS, in the CONGEST model, we show an $tilde{Omega}(n^2)$ lower bound for a constant approximation factor for MDS on $G^2$, whereas an $Omega(n^2)$ lower bound for MDS on $G$ is known only for exact computation. In addition to these highlighted results, we prove a number of other results in the distributed CONGEST model including an $tilde{Omega}(n^2)$ lower bound for computing an exact solution to MVC on $G^2$, a conditional hardness result for obtaining a $(1+epsilon)$-approximation to MVC on $G^2$, and an $O(log Delta)$-approximation to the MDS problem on $G^2$ in $mbox{poly}log n$ rounds.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

Scalable Katz Ranking Computation in Large Static and Dynamic Graphs

59 - Alexander van der Grinten , Elisabetta Bergamini , Oded Green andn David A. Bader 2018

Network analysis defines a number of centrality measures to identify the most central nodes in a network. Fast computation of those measures is a major challenge in algorithmic network analysis. Aside from closeness and betweenness, Katz centrality i s one of the established centrality measures. In this paper, we consider the problem of computing rankings for Katz centrality. In particular, we propose upper and lower bounds on the Katz score of a given node. While previous approaches relied on numerical approximation or heuristics to compute Katz centrality rankings, we construct an algorithm that iteratively improves those upper and lower bounds until a correct Katz ranking is obtained. We extend our algorithm to dynamic graphs while maintaining its correctness guarantees. Experiments demonstrate that our static graph algorithm outperforms both numerical approaches and heuristics with speedups between 1.5x and 3.5x, depending on the desired quality guarantees. Our dynamic graph algorithm improves upon the static algorithm for update batches of less than 10000 edges. We provide efficient parallel CPU and GPU implementations of our algorithms that enable near real-time Katz centrality computation for graphs with hundreds of millions of nodes in fractions of seconds.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

Efficient On-line Computation of Visibility Graphs

145 - Delia Fano Yela , Florian Thalmann , Vincenzo Nicosia 2019

A visibility algorithm maps time series into complex networks following a simple criterion. The resulting visibility graph has recently proven to be a powerful tool for time series analysis. However its straightforward computation is time-consuming a nd rigid, motivating the development of more efficient algorithms. Here we present a highly efficient method to compute visibility graphs with the further benefit of flexibility: on-line computation. We propose an encoder/decoder approach, with an on-line adjustable binary search tree codec for time series as well as its corresponding decoder for visibility graphs. The empirical evidence suggests the proposed method for computation of visibility graphs offers an on-line computation solution at no additional computation time cost. The source code is available online.

بنى وهياكل البيانات والخوارزميات

Sublinear-Time Algorithms for Compressive Phase Retrieval

180 - Yi Li , Vasileios Nakos 2017

In the compressive phase retrieval problem, or phaseless compressed sensing, or compressed sensing from intensity only measurements, the goal is to reconstruct a sparse or approximately $k$-sparse vector $x in mathbb{R}^n$ given access to $y= |Phi x| $, where $|v|$ denotes the vector obtained from taking the absolute value of $vinmathbb{R}^n$ coordinate-wise. In this paper we present sublinear-time algorithms for different variants of the compressive phase retrieval problem which are akin to the variants considered for the classical compressive sensing problem in theoretical computer science. Our algorithms use pure combinatorial techniques and near-optimal number of measurements.

بنى وهياكل البيانات والخوارزميات نظرية المعلومات نظرية المعلومات

سجل دخول لتتمكن من نشر تعليقات