Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Parallel Graph Algorithms in Constant Adaptive Rounds: Theory meets Practice

171 0 0.0 ( 0 )

Download Cite

Added by Laxman Dhulipala

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Soheil Behnezhad - Laxman Dhulipala - Hossein Esfandiari

Distributed Parallel and Cluster Computing Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study fundamental graph problems such as graph connectivity, minimum spanning forest (MSF), and approximate maximum (weight) matching in a distributed setting. In particular, we focus on the Adaptive Massively Parallel Computation (AMPC) model, which is a theoretical model that captures MapReduce-like computation augmented with a distributed hash table. We show the first AMPC algorithms for all of the studied problems that run in a constant number of rounds and use only $O(n^epsilon)$ space per machine, where $0 < epsilon < 1$. Our results improve both upon the previous results in the AMPC model, as well as the best-known results in the MPC model, which is the theoretical model underpinning many popular distributed computation frameworks, such as MapReduce, Hadoop, Beam, Pregel and Giraph. Finally, we provide an empirical comparison of the algorithms in the MPC and AMPC models in a fault-tolerant distriubted computation environment. We empirically evaluate our algorithms on a set of large real-world graphs and show that our AMPC algorithms can achieve improvements in both running time and round-complexity over optimized MPC baselines.

rate research

Correlation Clustering in Constant Many Parallel Rounds

90 - Vincent Cohen-Addad , Silvio Lattanzi , Slobodan Mitrovic 2021

Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem on graphs using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental analysis of our techniques.

Data Structures and Algorithms Distributed Parallel and Cluster Computing Machine Learning

Parallel Graph Connectivity in Log Diameter Rounds

202 - Alexandr Andoni , Clifford Stein , Zhao Song 2018

We study graph connectivity problem in MPC model. On an undirected graph with $n$ nodes and $m$ edges, $O(log n)$ round connectivity algorithms have been known for over 35 years. However, no algorithms with better complexity bounds were known. In this work, we give fully scalable, faster algorithms for the connectivity problem, by parameterizing the time complexity as a function of the diameter of the graph. Our main result is a $O(log D loglog_{m/n} n)$ time connectivity algorithm for diameter-$D$ graphs, using $Theta(m)$ total memory. If our algorithm can use more memory, it can terminate in fewer rounds, and there is no lower bound on the memory per processor. We extend our results to related graph problems such as spanning forest, finding a DFS sequence, exact/approximate minimum spanning forest, and bottleneck spanning forest. We also show that achieving similar bounds for reachability in directed graphs would imply faster boolean matrix multiplication algorithms. We introduce several new algorithmic ideas. We describe a general technique called double exponential speed problem size reduction which roughly means that if we can use total memory $N$ to reduce a problem from size $n$ to $n/k$, for $k=(N/n)^{Theta(1)}$ in one phase, then we can solve the problem in $O(loglog_{N/n} n)$ phases. In order to achieve this fast reduction for graph connectivity, we use a multistep algorithm. One key step is a carefully constructed truncated broadcasting scheme where each node broadcasts neighbor sets to its neighbors in a way that limits the size of the resulting neighbor sets. Another key step is random leader contraction, where we choose a smaller set of leaders than many previous works do.

Data Structures and Algorithms Distributed Parallel and Cluster Computing

Algebrisation in Distributed Graph Algorithms: Fast Matrix Multiplication in the Congested Clique

459 - Petteri Kaski , Janne H. Korhonen , Christoph Lenzen 2014

While algebrisation constitutes a powerful technique in the design and analysis of centralised algorithms, to date there have been hardly any applications of algebraic techniques in the context of distributed graph algorithms. This work is a case study that demonstrates the potential of algebrisation in the distributed context. We will focus on distributed graph algorithms in the congested clique model; the graph problems that we will consider include, e.g., the triangle detection problem and the all-pairs shortest path problem (APSP). There is plenty of prior work on combinatorial algorithms in the congested clique model: for example, Dolev et al. (DISC 2012) gave an algorithm for triangle detection with a running time of $tilde O(n^{1/3})$, and Nanongkai (STOC 2014) gave an approximation algorithm for APSP with a running time of $tilde O(n^{1/2})$. In this work, we will use algebraic techniques -- in particular, algorithms based on fast matrix multiplication -- to solve both triangle detection and the unweighted APSP in time $O(n^{0.15715})$; for weighted APSP, we give a $(1+o(1))$-approximation with this running time, as well as an exact $tilde O(n^{1/3})$ solution.

Distributed Parallel and Cluster Computing Data Structures and Algorithms

Exact Algorithms With Worst-case Guarantee For Scheduling: From Theory to Practice

233 - Lei Shang 2017

This PhD thesis summarizes research works on the design of exact algorithms that provide a worst-case (time or space) guarantee for NP-hard scheduling problems. Both theoretical and practical aspects are considered with three main results reported. The first one is about a Dynamic Programming algorithm which solves the F3Cmax problem in O*(3^n) time and space. The algorithm is easily generalized to other flowshop problems and single machine scheduling problems. The second contribution is about a search tree method called Branch & Merge which solves the 1||SumTi problem with the time complexity converging to O*(2^n) and in polynomial space. Our third contribution aims to improve the practical efficiency of exact search tree algorithms solving scheduling problems. First we realized that a better way to implement the idea of Branch & Merge is to use a technique called Memorization. By the finding of a new algorithmic paradox and the implementation of a memory cleaning strategy, the method succeeded to solve instances with 300 more jobs with respect to the state-of-the-art algorithm for the 1||SumTi problem. Then the treatment is extended to another three problems 1|ri|SumCi, 1|dtilde|SumwiCi and F2||SumCi. The results of the four problems all together show the power of the Memorization paradigm when applied on sequencing problems. We name it Branch & Memorize to promote a systematic consideration of Memorization as an essential building block in branching algorithms like Branch and Bound. The method can surely also be used to solve other problems, which are not necessarily scheduling problems.

Computational Complexity Data Structures and Algorithms

Massively Parallel Correlation Clustering in Bounded Arboricity Graphs

86 - Melanie Cambus , Davin Choo , Havu Miikonen 2021

Identifying clusters of similar elements in a set is a common task in data analysis. With the immense growth of data and physical limitations on single processor speed, it is necessary to find efficient parallel algorithms for clustering tasks. In this paper, we study the problem of correlation clustering in bounded arboricity graphs with respect to the Massively Parallel Computation (MPC) model. More specifically, we are given a complete graph where the edges are either positive or negative, indicating whether pairs of vertices are similar or dissimilar. The task is to partition the vertices into clusters with as few disagreements as possible. That is, we want to minimize the number of positive inter-cluster edges and negative intra-cluster edges. Consider an input graph $G$ on $n$ vertices such that the positive edges induce a $lambda$-arboric graph. Our main result is a 3-approximation ($textit{in expectation}$) algorithm to correlation clustering that runs in $mathcal{O}(log lambda cdot textrm{poly}(log log n))$ MPC rounds in the $textit{strongly sublinear memory regime}$. This is obtained by combining structural properties of correlation clustering on bounded arboricity graphs with the insights of Fischer and Noever (SODA 18) on randomized greedy MIS and the $texttt{PIVOT}$ algorithm of Ailon, Charikar, and Newman (STOC 05). Combined with known graph matching algorithms, our structural property also implies an exact algorithm and algorithms with $textit{worst case}$ $(1+epsilon)$-approximation guarantees in the special case of forests, where $lambda=1$.

Distributed Parallel and Cluster Computing Data Structures and Algorithms

comments

Fetching comments

Higher Institute of Business Administration

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Parallel Graph Algorithms in Constant Adaptive Rounds: Theory meets Practice

Ask ChatGPT about the research

No Arabic abstract

Read More