ﻻ يوجد ملخص باللغة العربية
Subgraph counting is a fundamental problem in analyzing massive graphs, often studied in the context of social and complex networks. There is a rich literature on designing efficient, accurate, and scalable algorithms for this problem. In this work, we tackle this challenge and design several new algorithms for subgraph counting in the Massively Parallel Computation (MPC) model: Given a graph $G$ over $n$ vertices, $m$ edges and $T$ triangles, our first main result is an algorithm that, with high probability, outputs a $(1+varepsilon)$-approximation to $T$, with optimal round and space complexity provided any $S geq max{(sqrt m, n^2/m)}$ space per machine, assuming $T=Omega(sqrt{m/n})$. Our second main result is an $tilde{O}_{delta}(log log n)$-rounds algorithm for exactly counting the number of triangles, parametrized by the arboricity $alpha$ of the input graph. The space per machine is $O(n^{delta})$ for any constant $delta$, and the total space is $O(malpha)$, which matches the time complexity of (combinatorial) triangle counting in the sequential model. We also prove that this result can be extended to exactly counting $k$-cliques for any constant $k$, with the same round complexity and total space $O(malpha^{k-2})$. Alternatively, allowing $O(alpha^2)$ space per machine, the total space requirement reduces to $O(nalpha^2)$. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most $5$, can be implemented in the MPC model in $tilde{O}_{delta}(sqrt{log n})$ rounds, $O(n^{delta})$ space per machine and $O(malpha^3)$ total space. Therefore, this result also exhibits the phenomenon that a time bound in the sequential model translates to a space bound in the MPC model.
For over a decade now we have been witnessing the success of {em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately
The problem of finding dense components of a graph is a widely explored area in data analysis, with diverse applications in fields and branches of study including community mining, spam detection, computer security and bioinformatics. This research p
In this paper, we study new batch-dynamic algorithms for the $k$-clique counting problem, which are dynamic algorithms where the updates are batches of edge insertions and deletions. We study this problem in the parallel setting, where the goal is to
Over the past decade, there has been increasing interest in distributed/parallel algorithms for processing large-scale graphs. By now, we have quite fast algorithms -- usually sublogarithmic-time and often $poly(loglog n)$-time, or even faster -- for
In this paper, we study the single-source shortest-path (SSSP) problem with positive edge weights, which is a notoriously hard problem in the parallel context. In practice, the $Delta$-stepping algorithm proposed by Meyer and Sanders has been widely