Design and implementation of self-adaptable parallel algorithms for scientific computing on highly heterogeneous HPC platforms

447 0 0.0 ( 0 )

Download Cite

Added by Alexey Lastovetsky

Publication date 2011

fields Informatics Engineering

and research's language is English

Authors Alexey Lastovetsky - Ravi Reddy - Vladimir Rychkov

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Traditional heterogeneous parallel algorithms, designed for heterogeneous clusters of workstations, are based on the assumption that the absolute speed of the processors does not depend on the size of the computational task. This assumption proved inaccurate for modern and perspective highly heterogeneous HPC platforms. New class of algorithms based on the functional performance model (FPM), representing the speed of the processor by a function of problem size, has been recently proposed. These algorithms cannot be however employed in self-adaptable applications because of very high cost of construction of the functional performance model. The paper presents a new class of parallel algorithms for highly heterogeneous HPC platforms. Like traditional FPM-based algorithms, these algorithms assume that the speed of the processors is characterized by speed functions rather than speed constants. Unlike the traditional algorithms, they do not assume the speed functions to be given. Instead, they estimate the speed functions of the processors for different problem sizes during their execution. These algorithms do not construct the full speed function for each processor but rather build and use their partial estimates sufficient for optimal distribution of computations with a given accuracy. The low execution cost of distribution of computations between heterogeneous processors in these algorithms make them suitable for employment in self-adaptable applications. Experiments with parallel matrix multiplication applications based on this approach are performed on local and global heterogeneous computational clusters. The results show that the execution time of optimal matrix distribution between processors is significantly less, by orders of magnitude, than the total execution time of the optimized application.

rate research

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

481 - Amani AlOnazi , David Keyes , Alexey Lastovetsky 2015

Hardware-aware design and optimization is crucial in exploiting emerging architectures for PDE-based computational fluid dynamics applications. In this work, we study optimizations aimed at acceleration of OpenFOAM-based applications on emerging hybrid heterogeneous platforms. OpenFOAM uses MPI to provide parallel multi-processor functionality, which scales well on homogeneous systems but does not fully utilize the potential per-node performance on hybrid heterogeneous platforms. In our study, we use two OpenFOAM applications, icoFoam and laplacianFoam, both based on Krylov iterative methods. We propose a number of optimizations of the dominant kernel of the Krylov solver, aimed at acceleration of the overall execution of the applications on modern GPU-accelerated heterogeneous platforms. Experimental results show that the proposed hybrid implementation significantly outperforms the state-of-the-art implementation.

Distributed Parallel and Cluster Computing

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

215 - Leiming Yu , Fanny Nina-Paravecino , David Kaeli 2017

We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language (OpenCL) framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strat- egies are developed to obtain efficient simulations using multiple central processing units (CPUs) and GPUs.

Distributed Parallel and Cluster Computing Computational Physics

Function Delivery Network: Extending Serverless Computing for Heterogeneous Platforms

232 - Anshul Jindal , Michael Gerndt , Mohak Chadha 2021

Serverless computing has rapidly grown following the launch of Amazons Lambda platform. Function-as-a-Service (FaaS) a key enabler of serverless computing allows an application to be decomposed into simple, standalone functions that are executed on a FaaS platform. The FaaS platform is responsible for deploying and facilitating resources to the functions. Several of todays cloud applications spread over heterogeneous connected computing resources and are highly dynamic in their structure and resource requirements. However, FaaS platforms are limited to homogeneous clusters and homogeneous functions and do not account for the data access behavior of functions before scheduling. We introduce an extension of FaaS to heterogeneous clusters and to support heterogeneous functions through a network of distributed heterogeneous target platforms called Function Delivery Network (FDN). A target platform is a combination of a cluster of homogeneous nodes and a FaaS platform on top of it. FDN provides Function-Delivery-as-a-Service (FDaaS), delivering the function to the right target platform. We showcase the opportunities such as varied target platforms characteristics, possibility of collaborative execution between multiple target platforms, and localization of data that the FDN offers in fulfilling two objectives: Service Level Objective (SLO) requirements and energy efficiency when scheduling functions by evaluating over five distributed target platforms using the FDNInspector, a tool developed by us for benchmarking distributed target platforms. Scheduling functions on an edge target platform in our evaluation reduced the overall energy consumption by 17x without violating the SLO requirements in comparison to scheduling on a high-end target platform.

Distributed Parallel and Cluster Computing Performance

Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing

494 - Justin M. Wozniak , Timothy G. Armstrong , Ketan C. Maheshwari 2021

Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations, interoperability challenges, parallel filesystem overheads due to the small file system accesses common in scripted approaches, and other issues. We present here a new approach to these problems in which the Swift scripting system is used to integrate high-level scripts written in Python, R, and Tcl, with native code developed in C, C++, and Fortran, by linking Swift to the library interfaces to the script interpreters. In this approach, Swift handles data management, movement, and marshaling among distributed-memory processes without direct user manipulation of low-level communication libraries such as MPI. We present a technique to efficiently launch scripted applications on large-scale supercomputers using a hierarchical programming model.

Distributed Parallel and Cluster Computing

EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms

99 - Christian Pilato , Stanislav Bohm , Fabien Brocheton 2021

High-Performance Big Data Analytics (HPDA) applications are characterized by huge volumes of distributed and heterogeneous data that require efficient computation for knowledge extraction and decision making. Designers are moving towards a tight integration of computing systems combining HPC, Cloud, and IoT solutions with artificial intelligence (AI). Matching the application and data requirements with the characteristics of the underlying hardware is a key element to improve the predictions thanks to high performance and better use of resources. We present EVEREST, a novel H2020 project started on October 1st, 2020 that aims at developing a holistic environment for the co-design of HPDA applications on heterogeneous, distributed, and secure platforms. EVEREST focuses on programmability issues through a data-driven design approach, the use of hardware-accelerated AI, and an efficient runtime monitoring with virtualization support. In the different stages, EVEREST combines state-of-the-art programming models, emerging communication standards, and novel domain-specific extensions. We describe the EVEREST approach and the use cases that drive our research.

Distributed Parallel and Cluster Computing