Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Optimizing the hybrid parallelization of BHAC

146 0 0.0 ( 0 )

Download Cite

Added by Salvatore Cielo

Publication date 2021

fields Informatics Engineering Physics

and research's language is English

Authors Salvatore Cielo - Oliver Porth - Luigi Iapichino

Distributed Parallel and Cluster Computing Instrumentation and Methods for Astrophysics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present our experience with the modernization on the GR-MHD code BHAC, aimed at improving its novel hybrid (MPI+OpenMP) parallelization scheme. In doing so, we showcase the use of performance profiling tools usable on x86 (Intel-based) architectures. Our performance characterization and threading analysis provided guidance in improving the concurrency and thus the efficiency of the OpenMP parallel regions. We assess scaling and communication patterns in order to identify and alleviate MPI bottlenecks, with both runtime switches and precise code interventions. The performance of optimized version of BHAC improved by $sim28%$, making it viable for scaling on several hundreds of supercomputer nodes. We finally test whether porting such optimizations to different hardware is likewise beneficial on the new architecture by running on ARM A64FX vector nodes.

rate research

A hybrid architecture for astronomical computing

73 - Changhua Li , Chenzhou Cui , Boliang He 2018

With many large science equipment constructing and putting into use, astronomy has stepped into the big data era. The new method and infrastructure of big data processing has become a new requirement of many astronomers. Cloud computing, Map/Reduce, Hadoop, Spark, etc. many new technology has sprung up in recent years. Comparing to the high performance computing(HPC), Data is the center of these new technology. So, a new computing architecture infrastructure is necessary, which can be shared by both HPC and big data processing. Based on Astronomy Cloud project of Chinese Virtual Observatory (China-VO), we have made much efforts to optimize the designation of the hybrid computing platform. which include the hardware architecture, cluster management, Job and Resource scheduling.

Distributed Parallel and Cluster Computing Instrumentation and Methods for Astrophysics

On the parallelization of stellar evolution codes

103 - David Martin , Jordi Jose (UPC 2018

Multidimensional nucleosynthesis studies with hundreds of nuclei linked through thousands of nuclear processes are still computationally prohibitive. To date, most nucleosynthesis studies rely either on hydrostatic/hydrodynamic simulations in spherical symmetry, or on post-processing simulations using temperature and density versus time profiles directly linked to huge nuclear reaction networks. Parallel computing has been regarded as the main permitting factor of computationally intensive simulations. This paper explores the different pros and cons in the parallelization of stellar codes, providing recommendations on when and how parallelization may help in improving the performance of a code for astrophysical applications. We report on different parallelization strategies succesfully applied to the spherically symmetric, Lagrangian, implicit hydrodynamic code SHIVA, extensively used in the modeling of classical novae and type I X-ray bursts. Speed-up factors of 26 and 35 have been obtained with a parallelized version of SHIVA, in a 200-shell simulation of a type I X-ray burst carried out with two nuclear reaction networks: a reduced one, consisting of 324 isotopes and 1392 reactions, and a more extended network with 606 nuclides and 3551 nuclear interactions. Maximum speed-ups of 41 (324-isotope network) and 85 (606-isotope network), are also predicted for 200 cores, stressing that the number of shells of the computational domain constitutes an effective upper limit for the maximum number of cores that could be used in a parallel application.

Solar and Stellar Astrophysics Instrumentation and Methods for Astrophysics Computational Physics

On the improvement of the in-place merge algorithm parallelization

61 - Berenger Bramas 2020

In this paper, we present several improvements in the parallelization of the in-place merge algorithm, which merges two contiguous sorted arrays into one with an O(T) space complexity (where T is the number of threads). The approach divides the two arrays into as many pairs of partitions as there are threads available; such that each thread can later merge a pair of partitions independently of the others. We extend the existing method by proposing a new algorithm to find the median of two partitions. Additionally, we provide a new strategy to divide the input arrays where we minimize the data movement, but at the cost of making this stage sequential. Finally, we provide the so-called linear shifting algorithm that swaps two partitions in-place with contiguous data access. We emphasize that our approach is straightforward to implement and that it can also be used for external (out of place) merging. The results demonstrate that it provides a significant speedup compared to sequential executions, when the size of the arrays is greater than a thousand elements.

Distributed Parallel and Cluster Computing Data Structures and Algorithms

Parallelization of the multi-level hp-adaptive finite cell method

56 - John N. Jomo 2018

The multi-level hp-refinement scheme is a powerful extension of the finite element method that allows local mesh adaptation without the trouble of constraining hanging nodes. This is achieved through hierarchical high-order overlay meshes, a hp-scheme based on spatial refinement by superposition. An efficient parallelization of this method using standard domain decomposition approaches in combination with ghost elements faces the challenge of a large basis function support resulting from the overlay structure and is in many cases not feasible. In this contribution, a parallelization strategy for the multi-level hp-scheme is presented that is adapted to the schemes simple hierarchical structure. By distributing the computational domain among processes on the granularity of the active leaf elements and utilizing shared mesh data structures, good parallel performance is achieved, as redundant computations on ghost elements are avoided. We show the schemes parallel scalability for problems with a few hundred elements per process. Furthermore, the scheme is used in conjunction with the finite cell method to perform numerical simulations on domains of complex shape.

Distributed Parallel and Cluster Computing Numerical Analysis Computational Physics

Parallelization of Weighted Sequence Comparison by using EBWT

335 - Shashank Srikant 2010

The Extended Burrows Wheeler transform (EBWT) helps to find the distance between two sequences. Implementation of an existing algorithm takes considerable amount of time for small size sequences. In this paper, we give a parallel implementation of this algorithm using NVIDIA Compute Unified Device Architecture (CUDA). We have obtained, on an average, a 2X improvement in the performance.

Distributed Parallel and Cluster Computing Data Structures and Algorithms

comments

Fetching comments

Ebla Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Optimizing the hybrid parallelization of BHAC

Ask ChatGPT about the research

No Arabic abstract

Read More