Complexity Measures for Map-Reduce, and Comparison to Parallel Computing

258 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ashish Goel

تاريخ النشر 2012

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ashish Goel - Kamesh Munagala

النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The programming paradigm Map-Reduce and its main open-source implementation, Hadoop, have had an enormous impact on large scale data processing. Our goal in this expository writeup is two-fold: first, we want to present some complexity measures that allow us to talk about Map-Reduce algorithms formally, and second, we want to point out why this model is actually different from other models of parallel programming, most notably the PRAM (Parallel Random Access Memory) model. We are looking for complexity measures that are detailed enough to make fine-grained distinction between different algorithms, but which also abstract away many of the implementation details.

قيم البحث

80 - Jennifer A. Loe , Heidi K. Thornquist , Erik G. Boman 2019

Polynomial preconditioning with the GMRES minimal residual polynomial has the potential to greatly reduce orthogonalization costs, making it useful for communication reduction. We implement polynomial preconditioning in the Belos package from Trilino s and show how it can be effective in both serial and parallel implementations. We further show it is a communication-avoiding technique and is a viable option to CA-GMRES for large-scale parallel computing.

التحليل العددي التحليل العددي

Parallel Computing Environments and Methods for Power Distribution System Simulation

65 - Ning Lu , Z. Todd Taylor , David P. Chassin 2004

The development of cost-effective highperformance parallel computing on multi-processor supercomputers makes it attractive to port excessively time consuming simulation software from personal computers (PC) to super computes. The power distribution s ystem simulator (PDSS) takes a bottom-up approach and simulates load at the appliance level, where detailed thermal models for appliances are used. This approach works well for a small power distribution system consisting of a few thousand appliances. When the number of appliances increases, the simulation uses up the PC memory and its runtime increases to a point where the approach is no longer feasible to model a practical large power distribution system. This paper presents an effort made to port a PC-based power distribution system simulator to a 128-processor shared-memory supercomputer. The paper offers an overview of the parallel computing environment and a description of the modification made to the PDSS model. The performance of the PDSS running on a standalone PC and on the supercomputer is compared. Future research direction of utilizing parallel computing in the power distribution system simulation is also addressed.

النظم الموزعة والتوازية والحوسبة العنقودية الهندسة الحاسوبية، المالية،العلوم أنظمة متعددة العملاء

Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing

494 - Justin M. Wozniak , Timothy G. Armstrong , Ketan C. Maheshwari 2021

Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted application s on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations, interoperability challenges, parallel filesystem overheads due to the small file system accesses common in scripted approaches, and other issues. We present here a new approach to these problems in which the Swift scripting system is used to integrate high-level scripts written in Python, R, and Tcl, with native code developed in C, C++, and Fortran, by linking Swift to the library interfaces to the script interpreters. In this approach, Swift handles data management, movement, and marshaling among distributed-memory processes without direct user manipulation of low-level communication libraries such as MPI. We present a technique to efficiently launch scripted applications on large-scale supercomputers using a hierarchical programming model.

النظم الموزعة والتوازية والحوسبة العنقودية

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

215 - Leiming Yu , Fanny Nina-Paravecino , David Kaeli 2017

We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language (OpenCL) framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strat- egies are developed to obtain efficient simulations using multiple central processing units (CPUs) and GPUs.

النظم الموزعة والتوازية والحوسبة العنقودية الفيزياء الحسابية

Hierarchical decoding to reduce hardware requirements for quantum computing

132 - Nicolas Delfosse 2020

Extensive quantum error correction is necessary in order to scale quantum hardware to the regime of practical applications. As a result, a significant amount of decoding hardware is necessary to process the colossal amount of data required to constan tly detect and correct errors occurring over the millions of physical qubits driving the computation. The implementation of a recent highly optimized version of Shors algorithm to factor a 2,048-bits integer would require more 7 TBit/s of bandwidth for the sole purpose of quantum error correction and up to 20,000 decoding units. To reduce the decoding hardware requirements, we propose a fault-tolerant quantum computing architecture based on surface codes with a cheap hard-decision decoder, the lazy decoder, combined with a sophisticated decoding unit that takes care of complex error configurations. Our design drops the decoding hardware requirements by several orders of magnitude assuming that good enough qubits are provided. Given qubits and quantum gates with a physical error rate $p=10^{-4}$, the lazy decoder drops both the bandwidth requirements and the number of decoding units by a factor 50x. Provided very good qubits with error rate $p=10^{-5}$, we obtain a 1,500x reduction in bandwidth and decoding hardware thanks to the lazy decoder. Finally, the lazy decoder can be used as a decoder accelerator. Our simulations show a 10x speed-up of the Union-Find decoder and a 50x speed-up of the Minimum Weight Perfect Matching decoder.

فيزياء الكم نظرية المعلومات نظرية المعلومات

سجل دخول لتتمكن من نشر تعليقات