Do you want to publish a course? Click here

Near-Optimal Massively Parallel Graph Connectivity

164   0   0.0 ( 0 )
 Added by Soheil Behnezhad
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Identifying the connected components of a graph, apart from being a fundamental problem with countless applications, is a key primitive for many other algorithms. In this paper, we consider this problem in parallel settings. Particularly, we focus on the Massively Parallel Computations (MPC) model, which is the standard theoretical model for modern parallel frameworks such as MapReduce, Hadoop, or Spark. We consider the truly sublinear regime of MPC for graph problems where the space per machine is $n^delta$ for some desirably small constant $delta in (0, 1)$. We present an algorithm that for graphs with diameter $D$ in the wide range $[log^{epsilon} n, n]$, takes $O(log D)$ rounds to identify the connected components and takes $O(log log n)$ rounds for all other graphs. The algorithm is randomized, succeeds with high probability, does not require prior knowledge of $D$, and uses an optimal total space of $O(m)$. We complement this by showing a conditional lower-bound based on the widely believed TwoCycle conjecture that $Omega(log D)$ rounds are indeed necessary in this setting. Studying parallel connectivity algorithms received a resurgence of interest after the pioneering work of Andoni et al. [FOCS 2018] who presented an algorithm with $O(log D cdot log log n)$ round-complexity. Our algorithm improves this result for the whole range of values of $D$ and almost settles the problem due to the conditional lower-bound. Additionally, we show that with minimal adjustments, our algorithm can also be implemented in a variant of the (CRCW) PRAM in asymptotically the same number of rounds.



rate research

Read More

We study graph connectivity problem in MPC model. On an undirected graph with $n$ nodes and $m$ edges, $O(log n)$ round connectivity algorithms have been known for over 35 years. However, no algorithms with better complexity bounds were known. In this work, we give fully scalable, faster algorithms for the connectivity problem, by parameterizing the time complexity as a function of the diameter of the graph. Our main result is a $O(log D loglog_{m/n} n)$ time connectivity algorithm for diameter-$D$ graphs, using $Theta(m)$ total memory. If our algorithm can use more memory, it can terminate in fewer rounds, and there is no lower bound on the memory per processor. We extend our results to related graph problems such as spanning forest, finding a DFS sequence, exact/approximate minimum spanning forest, and bottleneck spanning forest. We also show that achieving similar bounds for reachability in directed graphs would imply faster boolean matrix multiplication algorithms. We introduce several new algorithmic ideas. We describe a general technique called double exponential speed problem size reduction which roughly means that if we can use total memory $N$ to reduce a problem from size $n$ to $n/k$, for $k=(N/n)^{Theta(1)}$ in one phase, then we can solve the problem in $O(loglog_{N/n} n)$ phases. In order to achieve this fast reduction for graph connectivity, we use a multistep algorithm. One key step is a carefully constructed truncated broadcasting scheme where each node broadcasts neighbor sets to its neighbors in a way that limits the size of the resulting neighbor sets. Another key step is random leader contraction, where we choose a smaller set of leaders than many previous works do.
Data structures that allow efficient distance estimation (distance oracles, distance sketches, etc.) have been extensively studied, and are particularly well studied in centralized models and classical distributed models such as CONGEST. We initiate their study in newer (and arguably more realistic) models of distributed computation: the Congested Clique model and the Massively Parallel Computation (MPC) model. We provide efficient constructions in both of these models, but our core results are for MPC. In MPC we give two main results: an algorithm that constructs stretch/space optimal distance sketches but takes a (small) polynomial number of rounds, and an algorithm that constructs distance sketches with worse stretch but that only takes polylogarithmic rounds. Along the way, we show that other useful combinatorial structures can also be computed in MPC. In particular, one key component we use to construct distance sketches are an MPC construction of the hopsets of Elkin and Neiman (2016). This result has additional applications such as the first polylogarithmic time algorithm for constant approximate single-source shortest paths for weighted graphs in the low memory MPC setting.
The study of approximate matching in the Massively Parallel Computations (MPC) model has recently seen a burst of breakthroughs. Despite this progress, however, we still have a far more limited understanding of maximal matching which is one of the central problems of parallel and distributed computing. All known MPC algorithms for maximal matching either take polylogarithmic time which is considered inefficient, or require a strictly super-linear space of $n^{1+Omega(1)}$ per machine. In this work, we close this gap by providing a novel analysis of an extremely simple algorithm a variant of which was conjectured to work by Czumaj et al. [STOC18]. The algorithm edge-samples the graph, randomly partitions the vertices, and finds a random greedy maximal matching within each partition. We show that this algorithm drastically reduces the vertex degrees. This, among some other results, leads to an $O(log log Delta)$ round algorithm for maximal matching with $O(n)$ space (or even mildly sublinear in $n$ using standard techniques). As an immediate corollary, we get a $2$ approximate minimum vertex cover in essentially the same rounds and space. This is the best possible approximation factor under standard assumptions, culminating a long line of research. It also leads to an improved $O(loglog Delta)$ round algorithm for $1 + varepsilon$ approximate matching. All these results can also be implemented in the congested clique model within the same number of rounds.
Over the past decade, there has been increasing interest in distributed/parallel algorithms for processing large-scale graphs. By now, we have quite fast algorithms -- usually sublogarithmic-time and often $poly(loglog n)$-time, or even faster -- for a number of fundamental graph problems in the massively parallel computation (MPC) model. This model is a widely-adopted theoretical abstraction of MapReduce style settings, where a number of machines communicate in an all-to-all manner to process large-scale data. Contributing to this line of work on MPC graph algorithms, we present $poly(log k) in poly(loglog n)$ round MPC algorithms for computing $O(k^{1+{o(1)}})$-spanners in the strongly sublinear regime of local memory. To the best of our knowledge, these are the first sublogarithmic-time MPC algorithms for spanner construction. As primary applications of our spanners, we get two important implications, as follows: -For the MPC setting, we get an $O(log^2log n)$-round algorithm for $O(log^{1+o(1)} n)$ approximation of all pairs shortest paths (APSP) in the near-linear regime of local memory. To the best of our knowledge, this is the first sublogarithmic-time MPC algorithm for distance approximations. -Our result above also extends to the Congested Clique model of distributed computing, with the same round complexity and approximation guarantee. This gives the first sub-logarithmic algorithm for approximating APSP in weighted graphs in the Congested Clique model.
We present a massively parallel algorithm, with near-linear memory per machine, that computes a $(2+varepsilon)$-approximation of minimum-weight vertex cover in $O(loglog d)$ rounds, where $d$ is the average degree of the input graph. Our result fills the key remaining gap in the state-of-the-art MPC algorithms for vertex cover and matching problems; two classic optimization problems, which are duals of each other. Concretely, a recent line of work---by Czumaj et al. [STOC18], Ghaffari et al. [PODC18], Assadi et al. [SODA19], and Gamlath et al. [PODC19]---provides $O(loglog n)$ time algorithms for $(1+varepsilon)$-approximate maximum weight matching as well as for $(2+varepsilon)$-approximate minimum cardinality vertex cover. However, the latter algorithm does not work for the general weighted case of vertex cover, for which the best known algorithm remained at $O(log n)$ time complexity.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا