Equivalence Classes and Conditional Hardness in Massively Parallel Computations

142 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Michele Scquizzato

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Danupon Nanongkai - Michele Scquizzato

النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The Massively Parallel Computation (MPC) model serves as a common abstraction of many modern large-scale data processing frameworks, and has been receiving increasingly more attention over the past few years, especially in the context of classical graph problems. So far, the only way to argue lower bounds for this model is to condition on conjectures about the hardness of some specific problems, such as graph connectivity on promise graphs that are either one cycle or two cycles, usually called the one cycle vs. two cycles problem. This is unlike the traditional arguments based on conjectures about complexity classes (e.g., $textsf{P} eq textsf{NP}$), which are often more robust in the sense that refuting them would lead to groundbreaking algorithms for a whole bunch of problems. In this paper we present connections between problems and classes of problems that allow the latter type of arguments. These connections concern the class of problems solvable in a sublogarithmic amount of rounds in the MPC model, denoted by $textsf{MPC}(o(log N))$, and some standard classes concerning space complexity, namely $textsf{L}$ and $textsf{NL}$, and suggest conjectures that are robust in the sense that refuting them would lead to many surprisingly fast new algorithms in the MPC model. We also obtain new conditional lower bounds, and prove new reductions and equivalences between problems in the MPC model.

قيم البحث

86 - Melanie Cambus , Davin Choo , Havu Miikonen 2021

Identifying clusters of similar elements in a set is a common task in data analysis. With the immense growth of data and physical limitations on single processor speed, it is necessary to find efficient parallel algorithms for clustering tasks. In th is paper, we study the problem of correlation clustering in bounded arboricity graphs with respect to the Massively Parallel Computation (MPC) model. More specifically, we are given a complete graph where the edges are either positive or negative, indicating whether pairs of vertices are similar or dissimilar. The task is to partition the vertices into clusters with as few disagreements as possible. That is, we want to minimize the number of positive inter-cluster edges and negative intra-cluster edges. Consider an input graph $G$ on $n$ vertices such that the positive edges induce a $lambda$-arboric graph. Our main result is a 3-approximation ($textit{in expectation}$) algorithm to correlation clustering that runs in $mathcal{O}(log lambda cdot textrm{poly}(log log n))$ MPC rounds in the $textit{strongly sublinear memory regime}$. This is obtained by combining structural properties of correlation clustering on bounded arboricity graphs with the insights of Fischer and Noever (SODA 18) on randomized greedy MIS and the $texttt{PIVOT}$ algorithm of Ailon, Charikar, and Newman (STOC 05). Combined with known graph matching algorithms, our structural property also implies an exact algorithm and algorithms with $textit{worst case}$ $(1+epsilon)$-approximation guarantees in the special case of forests, where $lambda=1$.

النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات

Massively Parallel Computation via Remote Memory Access

209 - Soheil Behnezhad , Laxman Dhulipala , Hossein Esfandiari 2019

We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round, all machines are provided with random read access to the data store, subject to the same constraints on the total amount of communication as in the MPC model. Our model is inspired by the previous empirical studies of distributed graph algorithms using MapReduce and a distributed hash table service. This extension allows us to give new graph algorithms with much lower round complexities compared to the best known solutions in the MPC model. In particular, in the AMPC model we show how to solve maximal independent set in $O(1)$ rounds and connectivity/minimum spanning tree in $O(loglog_{m/n} n)$ rounds both using $O(n^delta)$ space per machine for constant $delta < 1$. In the same memory regime for MPC, the best known algorithms for these problems require polylog $n$ rounds. Our results imply that the 2-Cycle conjecture, which is widely believed to hold in the MPC model, does not hold in the AMPC model.

النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات

Massively parallel computations in algebraic geometry - not a contradiction

216 - Janko Boehm , Anne Fruhbis-Kruger , Mirko Rahn 2018

The design and implementation of parallel algorithms is a fundamental task in computer algebra. Combining the computer algebra system Singular and the workflow management system GPI-Space, we have developed an infrastructure for massively parallel co mputations in commutative algebra and algebraic geometry. In this note, we give an overview on the current capabilities of this framework by looking into three sample applications: determining smoothness of algebraic varieties, computing GIT-fans in geometric invariant theory, and determining tropicalizations. These applications employ algorithmic methods originating from commutative algebra, sheaf structures on manifolds, local geometry, convex geometry, group theory, and combinatorics, illustrating the potential of the framework in further problems in computer algebra.

الهندسة الجبرية

Flexible and scalable particle-in-cell methods for massively parallel computations

132 - R. Gassmoeller , E. Heien , E. G. Puckett 2016

Particle-in-cell methods couple mesh-based methods for the solution of continuum mechanics problems, with the ability to advect and evolve particles. They have a long history and many applications in scientific computing. However, they have most ofte n only been implemented for either sequential codes, or parallel codes with static meshes that are statically partitioned. In contrast, many mesh-based codes today use adaptively changing, dynamically partitioned meshes, and can scale to thousands or tens of thousands of processors. Consequently, there is a need to revisit the data structures and algorithms necessary to use particle methods with modern, mesh-based methods. Here we review commonly encountered requirements of particle-in-cell methods, and describe efficient ways to implement them in the context of large-scale parallel finite-element codes that use dynamically changing meshes. We also provide practical experience for how to address bottlenecks that impede the efficient implementation of these algorithms and demonstrate with numerical tests both that our algorithms can be implemented with optimal complexity and that they are suitable for very large-scale, practical applications. We provide a reference implementation in ASPECT, an open source code for geodynamic mantle-convection simulations built on the deal.II library.

التحليل العددي الفيزياء الحسابية

Communication-free Massively Distributed Graph Generation

92 - Daniel Funke , Sebastian Lamm , Ulrich Meyer 2017

Analyzing massive complex networks yields promising insights about our everyday lives. Building scalable algorithms to do so is a challenging task that requires a careful analysis and an extensive evaluation. However, engineering such algorithms is o ften hindered by the scarcity of publicly~available~datasets. Network generators serve as a tool to alleviate this problem by providing synthetic instances with controllable parameters. However, many network generators fail to provide instances on a massive scale due to their sequential nature or resource constraints. Additionally, truly scalable network generators are few and often limited in their realism. In this work, we present novel generators for a variety of network models that are frequently used as benchmarks. By making use of pseudorandomization and divide-and-conquer schemes, our generators follow a communication-free paradigm. The resulting generators are thus embarrassingly parallel and have a near optimal scaling behavior. This allows us to generate instances of up to $2^{43}$ vertices and $2^{47}$ edges in less than 22 minutes on 32768 cores. Therefore, our generators allow new graph families to be used on an unprecedented scale.

النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات الشبكات الاجتماعية والمعلومات

سجل دخول لتتمكن من نشر تعليقات