The Power of $D$-hops in Matching Power-Law Graphs

69 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Liren Yu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Liren Yu - Jiaming Xu - Xiaojun Lin

بنى وهياكل البيانات والخوارزميات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper studies seeded graph matching for power-law graphs. Assume that two edge-correlated graphs are independently edge-sampled from a common parent graph with a power-law degree distribution. A set of correctly matched vertex-pairs is chosen at random and revealed as initial seeds. Our goal is to use the seeds to recover the remaining latent vertex correspondence between the two graphs. Departing from the existing approaches that focus on the use of high-degree seeds in $1$-hop neighborhoods, we develop an efficient algorithm that exploits the low-degree seeds in suitably-defined $D$-hop neighborhoods. Specifically, we first match a set of vertex-pairs with appropriate degrees (which we refer to as the first slice) based on the number of low-degree seeds in their $D$-hop neighborhoods. This significantly reduces the number of initial seeds needed to trigger a cascading process to match the rest of the graphs. Under the Chung-Lu random graph model with $n$ vertices, max degree $Theta(sqrt{n})$, and the power-law exponent $2<beta<3$, we show that as soon as $D> frac{4-beta}{3-beta}$, by optimally choosing the first slice, with high probability our algorithm can correctly match a constant fraction of the true pairs without any error, provided with only $Omega((log n)^{4-beta})$ initial seeds. Our result achieves an exponential reduction in the seed size requirement, as the best previously known result requires $n^{1/2+epsilon}$ seeds (for any small constant $epsilon>0$). Performance evaluation with synthetic and real data further corroborates the improved performance of our algorithm.

قيم البحث

321 - Zhewei Wei , Xiaodong He , Xiaokui Xiao 2019

{it SimRank} is a classic measure of the similarities of nodes in a graph. Given a node $u$ in graph $G =(V, E)$, a {em single-source SimRank query} returns the SimRank similarities $s(u, v)$ between node $u$ and each node $v in V$. This type of quer ies has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes $n$, which renders them inapplicable for real-time and interactive analysis. { This paper proposes prsim, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. prsim uses an index of size $O(m)$, where $m$ is the number of edges in the graph, and guarantees a query time that depends on the {em reverse PageRank} distribution of the input graph. In particular, we prove that prsim runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph.} Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that prsim outperforms the state of the art in terms of query time, accuracy, index size, and scalability.

بنى وهياكل البيانات والخوارزميات

Bootstrap percolation in power-law random graphs

205 - Hamed Amini , Nikolaos Fountoulakis 2011

A bootstrap percolation process on a graph $G$ is an infection process which evolves in rounds. Initially, there is a subset of infected nodes and in each subsequent round each uninfected node which has at least $r$ infected neighbours becomes infect ed and remains so forever. The parameter $rgeq 2$ is fixed. Such processes have been used as models for the spread of ideas or trends within a network of individuals. We analyse bootstrap percolation process in the case where the underlying graph is an inhomogeneous random graph, which exhibits a power-law degree distribution, and initially there are $a(n)$ randomly infected nodes. The main focus of this paper is the number of vertices that will have been infected by the end of the process. The main result of this work is that if the degree sequence of the random graph follows a power law with exponent $beta$, where $2 < beta < 3$, then a sublinear number of initially infected vertices is enough to spread the infection over a linear fraction of the nodes of the random graph, with high probability. More specifically, we determine explicitly a critical function $a_c(n)$ such that $a_c(n)=o(n)$ with the following property. Assuming that $n$ is the number of vertices of the underlying random graph, if $a(n) ll a_c(n)$, then the process does not evolve at all, with high probability as $n$ grows, whereas if $a(n)gg a_c(n)$, then there is a constant $eps>0$ such that, with high probability, the final set of infected vertices has size at least $eps n$. It turns out that when the maximum degree is $o(n^{1/(beta -1)})$, then $a_c(n)$ depends also on $r$. But when the maximum degree is $Theta (n^{1/(beta -1)})$, then $a_c (n)=n^{beta -2 over beta -1}$.

الاحتمالات الميكانيكا الإحصائية التوافقية

Distributed Approximation on Power Graphs

117 - Reuven Bar-Yehuda , Keren Censor-Hillel , Yannic Maus 2020

We investigate graph problems in the following setting: we are given a graph $G$ and we are required to solve a problem on $G^2$. While we focus mostly on exploring this theme in the distributed CONGEST model, we show new results and surprising conne ctions to the centralized model of computation. In the CONGEST model, it is natural to expect that problems on $G^2$ would be quite difficult to solve efficiently on $G$, due to congestion. However, we show that the picture is both more complicated and more interesting. Specifically, we encounter two phenomena acting in opposing directions: (i) slowdown due to congestion and (ii) speedup due to structural properties of $G^2$. We demonstrate these two phenomena via two fundamental graph problems, namely, Minimum Vertex Cover (MVC) and Minimum Dominating Set (MDS). Among our many contributions, the highlights are the following. - In the CONGEST model, we show an $O(n/epsilon)$-round $(1+epsilon)$-approximation algorithm for MVC on $G^2$, while no $o(n^2)$-round algorithm is known for any better-than-2 approximation for MVC on $G$. - We show a centralized polynomial time $5/3$-approximation algorithm for MVC on $G^2$, whereas a better-than-2 approximation is UGC-hard for $G$. - In contrast, for MDS, in the CONGEST model, we show an $tilde{Omega}(n^2)$ lower bound for a constant approximation factor for MDS on $G^2$, whereas an $Omega(n^2)$ lower bound for MDS on $G$ is known only for exact computation. In addition to these highlighted results, we prove a number of other results in the distributed CONGEST model including an $tilde{Omega}(n^2)$ lower bound for computing an exact solution to MVC on $G^2$, a conditional hardness result for obtaining a $(1+epsilon)$-approximation to MVC on $G^2$, and an $O(log Delta)$-approximation to the MDS problem on $G^2$ in $mbox{poly}log n$ rounds.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

PERMUTATION Strikes Back: The Power of Recourse in Online Metric Matching

101 - Varun Gupta , Ravishankar Krishnaswamy , Sai Sandeep 2019

In the classical Online Metric Matching problem, we are given a metric space with $k$ servers. A collection of clients arrive in an online fashion, and upon arrival, a client should irrevocably be matched to an as-yet-unmatched server. The goal is to find an online matching which minimizes the total cost, i.e., the sum of distances between each client and the server it is matched to. We know deterministic algorithms~cite{KP93,khuller1994line} that achieve a competitive ratio of $2k-1$, and this bound is tight for deterministic algorithms. The problem has also long been considered in specialized metrics such as the line metric or metrics of bounded doubling dimension, with the current best result on a line metric being a deterministic $O(log k)$ competitive algorithm~cite{raghvendra2018optimal}. Obtaining (or refuting) $O(log k)$-competitive algorithms in general metrics and constant-competitive algorithms on the line metric have been long-standing open questions in this area. In this paper, we investigate the robustness of these lower bounds by considering the Online Metric Matching with Recourse problem where we are allowed to change a small number of previous assignments upon arrival of a new client. Indeed, we show that a small logarithmic amount of recourse can significantly improve the quality of matchings we can maintain. For general metrics, we show a simple emph{deterministic} $O(log k)$-competitive algorithm with $O(log k)$-amortized recourse, an exponential improvement over the $2k-1$ lower bound when no recourse is allowed. We next consider the line metric, and present a deterministic algorithm which is $3$-competitive and has $O(log k)$-recourse, again a substantial improvement over the best known $O(log k)$-competitive algorithm when no recourse is allowed.

بنى وهياكل البيانات والخوارزميات

Nucleosynthesis in Power-Law Cosmologies

58 - M. Kaplinghat , G. Steigman , T. P. Walker 1999

We have recently considered cosmologies in which the Universal scale factor varies as a power of the age of the Universe and concluded that they cannot satisfy the observational constraints on the present age, the magnitude-redshift relation for SN I a, and the primordial element (D, He3, He4, and Li7) abundances. This claim has been challenged in a proposal that suggested a high baryon density model (Omega_B*h*h = 0.3) with an expansion factor varing linearly with time could be consistent with the observed abundance of primoridal helium-4, while satisfying the age and magnitude-redshift constraints. In this paper we further explore primordial nucleosynthesis in generic power-law cosmologies, including the linear case, concluding that models selected to satisfy the other observational constraints are incapable of accounting for all the light element abundances.