Round Compression for Parallel Matching Algorithms

80 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Slobodan Mitrovi\\'c

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Artur Czumaj - Jakub {L}k{a}cki - Aleksander Mk{a}dry

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

For over a decade now we have been witnessing the success of {em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the {em maximum matching} problem---one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in $O(log{n})$ rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has $n^{1+Omega(1)}$ memory, this problem can also be solved $2$-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly $O(log{n})$ rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that perplexing possibility. That is, we break the above $O(log n)$ round complexity bound even in the case of {em slightly sublinear} memory per machine. In fact, our improvement here is {em almost exponential}: we are able to deliver a $(2+epsilon)$-approximation to maximum matching, for any fixed constant $epsilon>0$, in $O((log log n)^2)$ rounds.

قيم البحث

139 - Thomas Erlebach , Michael Hoffmann , Murilo S. de Lima (School ofn Informatics 2021

The area of computing with uncertainty considers problems where some information about the input elements is uncertain, but can be obtained using queries. For example, instead of the weight of an element, we may be given an interval that is guarantee d to contain the weight, and a query can be performed to reveal the weight. While previous work has considered models where queries are asked either sequentially (adaptive model) or all at once (non-adaptive model), and the goal is to minimize the number of queries that are needed to solve the given problem, we propose and study a new model where $k$ queries can be made in parallel in each round, and the goal is to minimize the number of query rounds. We use competitive analysis and present upper and lower bounds on the number of query rounds required by any algorithm in comparison with the optimal number of query rounds. Given a set of uncertain elements and a family of $m$ subsets of that set, we present an algorithm for determining the value of the minimum of each of the subsets that requires at most $(2+varepsilon) cdot mathrm{opt}_k+mathrm{O}left(frac{1}{varepsilon} cdot lg mright)$ rounds for every $0<varepsilon<1$, where $mathrm{opt}_k$ is the optimal number of rounds, as well as nearly matching lower bounds. For the problem of determining the $i$-th smallest value and identifying all elements with that value in a set of uncertain elements, we give a $2$-round-competitive algorithm. We also show that the problem of sorting a family of sets of uncertain elements admits a $2$-round-competitive algorithm and this is the best possible.

بنى وهياكل البيانات والخوارزميات

Parallel Algorithms for Small Subgraph Counting

371 - Amartya Shankha Biswas , Talya Eden , Quanquan C. Liu 2020

Subgraph counting is a fundamental problem in analyzing massive graphs, often studied in the context of social and complex networks. There is a rich literature on designing efficient, accurate, and scalable algorithms for this problem. In this work, we tackle this challenge and design several new algorithms for subgraph counting in the Massively Parallel Computation (MPC) model: Given a graph $G$ over $n$ vertices, $m$ edges and $T$ triangles, our first main result is an algorithm that, with high probability, outputs a $(1+varepsilon)$-approximation to $T$, with optimal round and space complexity provided any $S geq max{(sqrt m, n^2/m)}$ space per machine, assuming $T=Omega(sqrt{m/n})$. Our second main result is an $tilde{O}_{delta}(log log n)$-rounds algorithm for exactly counting the number of triangles, parametrized by the arboricity $alpha$ of the input graph. The space per machine is $O(n^{delta})$ for any constant $delta$, and the total space is $O(malpha)$, which matches the time complexity of (combinatorial) triangle counting in the sequential model. We also prove that this result can be extended to exactly counting $k$-cliques for any constant $k$, with the same round complexity and total space $O(malpha^{k-2})$. Alternatively, allowing $O(alpha^2)$ space per machine, the total space requirement reduces to $O(nalpha^2)$. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most $5$, can be implemented in the MPC model in $tilde{O}_{delta}(sqrt{log n})$ rounds, $O(n^{delta})$ space per machine and $O(malpha^3)$ total space. Therefore, this result also exhibits the phenomenon that a time bound in the sequential model translates to a space bound in the MPC model.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

Massively Parallel Algorithms for Distance Approximation and Spanners

136 - Amartya Shankha Biswas , Michal Dory , Mohsen Ghaffari 2020

Over the past decade, there has been increasing interest in distributed/parallel algorithms for processing large-scale graphs. By now, we have quite fast algorithms -- usually sublogarithmic-time and often $poly(loglog n)$-time, or even faster -- for a number of fundamental graph problems in the massively parallel computation (MPC) model. This model is a widely-adopted theoretical abstraction of MapReduce style settings, where a number of machines communicate in an all-to-all manner to process large-scale data. Contributing to this line of work on MPC graph algorithms, we present $poly(log k) in poly(loglog n)$ round MPC algorithms for computing $O(k^{1+{o(1)}})$-spanners in the strongly sublinear regime of local memory. To the best of our knowledge, these are the first sublogarithmic-time MPC algorithms for spanner construction. As primary applications of our spanners, we get two important implications, as follows: -For the MPC setting, we get an $O(log^2log n)$-round algorithm for $O(log^{1+o(1)} n)$ approximation of all pairs shortest paths (APSP) in the near-linear regime of local memory. To the best of our knowledge, this is the first sublogarithmic-time MPC algorithm for distance approximations. -Our result above also extends to the Congested Clique model of distributed computing, with the same round complexity and approximation guarantee. This gives the first sub-logarithmic algorithm for approximating APSP in weighted graphs in the Congested Clique model.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

A Parallel Approximation Algorithm for Maximizing Submodular $b$-Matching

160 - S M Ferdous , Alex Pothen , Arif Khan 2021

We design new serial and parallel approximation algorithms for computing a maximum weight $b$-matching in an edge-weighted graph with a submodular objective function. This problem is NP-hard; the new algorithms have approximation ratio $1/3$, and are relaxations of the Greedy algorithm that rely only on local information in the graph, making them parallelizable. We have designed and implemented Local Lazy Greedy algorithms for both serial and parallel computers. We have applied the approximate submodular $b$-matching algorithm to assign tasks to processors in the computation of Fock matrices in quantum chemistry on parallel computers. The assignment seeks to reduce the run time by balancing the computational load on the processors and bounding the number of messages that each processor sends. We show that the new assignment of tasks to processors provides a four fold speedup over the currently used assignment in the NWChemEx software on $8000$ processors on the Summit supercomputer at Oak Ridge National Lab.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

Exponentially Faster Massively Parallel Maximal Matching

110 - Soheil Behnezhad , MohammadTaghi Hajiaghayi , David G. Harris 2019

The study of approximate matching in the Massively Parallel Computations (MPC) model has recently seen a burst of breakthroughs. Despite this progress, however, we still have a far more limited understanding of maximal matching which is one of the ce ntral problems of parallel and distributed computing. All known MPC algorithms for maximal matching either take polylogarithmic time which is considered inefficient, or require a strictly super-linear space of $n^{1+Omega(1)}$ per machine. In this work, we close this gap by providing a novel analysis of an extremely simple algorithm a variant of which was conjectured to work by Czumaj et al. [STOC18]. The algorithm edge-samples the graph, randomly partitions the vertices, and finds a random greedy maximal matching within each partition. We show that this algorithm drastically reduces the vertex degrees. This, among some other results, leads to an $O(log log Delta)$ round algorithm for maximal matching with $O(n)$ space (or even mildly sublinear in $n$ using standard techniques). As an immediate corollary, we get a $2$ approximate minimum vertex cover in essentially the same rounds and space. This is the best possible approximation factor under standard assumptions, culminating a long line of research. It also leads to an improved $O(loglog Delta)$ round algorithm for $1 + varepsilon$ approximate matching. All these results can also be implemented in the congested clique model within the same number of rounds.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

سجل دخول لتتمكن من نشر تعليقات