Complexity Measures for Map-Reduce, and Comparison to Parallel Computing

540 0 0.0 ( 0 )

Download Cite

Added by Ashish Goel

Publication date 2012

fields Informatics Engineering

and research's language is English

Authors Ashish Goel - Kamesh Munagala

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The programming paradigm Map-Reduce and its main open-source implementation, Hadoop, have had an enormous impact on large scale data processing. Our goal in this expository writeup is two-fold: first, we want to present some complexity measures that allow us to talk about Map-Reduce algorithms formally, and second, we want to point out why this model is actually different from other models of parallel programming, most notably the PRAM (Parallel Random Access Memory) model. We are looking for complexity measures that are detailed enough to make fine-grained distinction between different algorithms, but which also abstract away many of the implementation details.

rate research

Polynomial Preconditioned GMRES to Reduce Communication in Parallel Computing

80 - Jennifer A. Loe , Heidi K. Thornquist , Erik G. Boman 2019

Polynomial preconditioning with the GMRES minimal residual polynomial has the potential to greatly reduce orthogonalization costs, making it useful for communication reduction. We implement polynomial preconditioning in the Belos package from Trilinos and show how it can be effective in both serial and parallel implementations. We further show it is a communication-avoiding technique and is a viable option to CA-GMRES for large-scale parallel computing.

Numerical Analysis Numerical Analysis

Parallel Computing Environments and Methods for Power Distribution System Simulation

65 - Ning Lu , Z. Todd Taylor , David P. Chassin 2004

The development of cost-effective highperformance parallel computing on multi-processor supercomputers makes it attractive to port excessively time consuming simulation software from personal computers (PC) to super computes. The power distribution system simulator (PDSS) takes a bottom-up approach and simulates load at the appliance level, where detailed thermal models for appliances are used. This approach works well for a small power distribution system consisting of a few thousand appliances. When the number of appliances increases, the simulation uses up the PC memory and its runtime increases to a point where the approach is no longer feasible to model a practical large power distribution system. This paper presents an effort made to port a PC-based power distribution system simulator to a 128-processor shared-memory supercomputer. The paper offers an overview of the parallel computing environment and a description of the modification made to the PDSS model. The performance of the PDSS running on a standalone PC and on the supercomputer is compared. Future research direction of utilizing parallel computing in the power distribution system simulation is also addressed.

Distributed Parallel and Cluster Computing Computational Engineering Multiagent Systems

Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing

494 - Justin M. Wozniak , Timothy G. Armstrong , Ketan C. Maheshwari 2021

Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations, interoperability challenges, parallel filesystem overheads due to the small file system accesses common in scripted approaches, and other issues. We present here a new approach to these problems in which the Swift scripting system is used to integrate high-level scripts written in Python, R, and Tcl, with native code developed in C, C++, and Fortran, by linking Swift to the library interfaces to the script interpreters. In this approach, Swift handles data management, movement, and marshaling among distributed-memory processes without direct user manipulation of low-level communication libraries such as MPI. We present a technique to efficiently launch scripted applications on large-scale supercomputers using a hierarchical programming model.

Distributed Parallel and Cluster Computing

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

215 - Leiming Yu , Fanny Nina-Paravecino , David Kaeli 2017

We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language (OpenCL) framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strat- egies are developed to obtain efficient simulations using multiple central processing units (CPUs) and GPUs.

Distributed Parallel and Cluster Computing Computational Physics

Hierarchical decoding to reduce hardware requirements for quantum computing

132 - Nicolas Delfosse 2020

Extensive quantum error correction is necessary in order to scale quantum hardware to the regime of practical applications. As a result, a significant amount of decoding hardware is necessary to process the colossal amount of data required to constantly detect and correct errors occurring over the millions of physical qubits driving the computation. The implementation of a recent highly optimized version of Shors algorithm to factor a 2,048-bits integer would require more 7 TBit/s of bandwidth for the sole purpose of quantum error correction and up to 20,000 decoding units. To reduce the decoding hardware requirements, we propose a fault-tolerant quantum computing architecture based on surface codes with a cheap hard-decision decoder, the lazy decoder, combined with a sophisticated decoding unit that takes care of complex error configurations. Our design drops the decoding hardware requirements by several orders of magnitude assuming that good enough qubits are provided. Given qubits and quantum gates with a physical error rate $p=10^{-4}$, the lazy decoder drops both the bandwidth requirements and the number of decoding units by a factor 50x. Provided very good qubits with error rate $p=10^{-5}$, we obtain a 1,500x reduction in bandwidth and decoding hardware thanks to the lazy decoder. Finally, the lazy decoder can be used as a decoder accelerator. Our simulations show a 10x speed-up of the Union-Find decoder and a 50x speed-up of the Minimum Weight Perfect Matching decoder.

Quantum Physics Information Theory Information Theory