Acceleration of three-dimensional Tokamak magnetohydrodynamical code with graphics processing unit and OpenACC heterogeneous parallel programming

242 0 0.0 ( 0 )

Download Cite

Added by Haowei Zhang

Publication date 2018

fields Physics

and research's language is English

Authors H. W. Zhang - J. Zhu - Z. W. Ma

Computational Physics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, the OpenACC heterogeneous parallel programming model is successfully applied to modification and acceleration of the three-dimensional Tokamak magnetohydrodynamical code (CLTx). Through combination of OpenACC and MPI technologies, CLTx is further parallelized by using multiple-GPUs. Significant speedup ratios are achieved on NVIDIA TITAN Xp and TITAN V GPUs, respectively, with very few modifications of CLTx. Furthermore, the validity of the double precision calculations on the above-mentioned two graphics cards has also been strictly verified with m/n=2/1 resistive tearing mode instability in Tokamak.

rate research

Air pollution modelling using a graphics processing unit with CUDA

337 - Ferenc Molnar Jr. , Tamas Szakaly , Robert Meszaros 2009

The Graphics Processing Unit (GPU) is a powerful tool for parallel computing. In the past years the performance and capabilities of GPUs have increased, and the Compute Unified Device Architecture (CUDA) - a parallel computing architecture - has been developed by NVIDIA to utilize this performance in general purpose computations. Here we show for the first time a possible application of GPU for environmental studies serving as a basement for decision making strategies. A stochastic Lagrangian particle model has been developed on CUDA to estimate the transport and the transformation of the radionuclides from a single point source during an accidental release. Our results show that parallel implementation achieves typical acceleration values in the order of 80-120 times compared to CPU using a single-threaded implementation on a 2.33 GHz desktop computer. Only very small differences have been found between the results obtained from GPU and CPU simulations, which are comparable with the effect of stochastic transport phenomena in atmosphere. The relatively high speedup with no additional costs to maintain this parallel architecture could result in a wide usage of GPU for diversified environmental applications in the near future.

Computational Physics Medical Physics

Molecular Dynamics Simulation of Macromolecules Using Graphics Processing Unit

418 - Ji Xu , Ying Ren , Wei Ge 2010

Molecular dynamics (MD) simulation is a powerful computational tool to study the behavior of macromolecular systems. But many simulations of this field are limited in spatial or temporal scale by the available computational resource. In recent years, graphics processing unit (GPU) provides unprecedented computational power for scientific applications. Many MD algorithms suit with the multithread nature of GPU. In this paper, MD algorithms for macromolecular systems that run entirely on GPU are presented. Compared to the MD simulation with free software GROMACS on a single CPU core, our codes achieve about 10 times speed-up on a single GPU. For validation, we have performed MD simulations of polymer crystallization on GPU, and the results observed perfectly agree with computations on CPU. Therefore, our single GPU codes have already provided an inexpensive alternative for macromolecular simulations on traditional CPU clusters and they can also be used as a basis to develop parallel GPU programs to further speedup the computations.

Computational Physics Materials Science Soft Condensed Matter

Performance Acceleration of Kernel Polynomial Method Applying Graphics Processing Units

482 - Shixun Zhang , Shinichi Yamagiwa , Masahiko Okumura 2011

The Kernel Polynomial Method (KPM) is one of the fast diagonalization methods used for simulations of quantum systems in research fields of condensed matter physics and chemistry. The algorithm has a difficulty to be parallelized on a cluster computer or a supercomputer due to the fine-gain recursive calculations. This paper proposes an implementation of the KPM on the recent graphics processing units (GPU) where the recursive calculations are able to be parallelized in the massively parallel environment. This paper also illustrates performance evaluations regarding the cases when the actual simulation parameters are applied, the one for increased intensive calculations and the one for increased amount of memory usage. Finally, it concludes that the performance on GPU promises very high performance compared to the one on CPU and reduces the overall simulation time.

Computational Physics Other Condensed Matter Performance

On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters

139 - David B. Williams-Young , Wibe A. de Jong , Hubertus J.J. van Damn 2020

The predominance of Kohn-Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high performance computing (HPC). With recent trends in HPC leading towards in increasing reliance on heterogeneous accelerator based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high-levels of performance which have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn-Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high-levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.

Computational Physics Distributed Parallel and Cluster Computing Chemical Physics

Speeding up complex multivariate data analysis in Borexino with parallel computing based on Graphics Processing Unit

74 - X.F. Ding , M. Agostini , K. Altenmuller 2018

A spectral fitter based on the graphics processor unit (GPU) has been developed for Borexino solar neutrino analysis. It is able to shorten the fitting time to a superior level compared to the CPU fitting procedure. In Borexino solar neutrino spectral analysis, fitting usually requires around one hour to converge since it includes time-consuming convolutions in order to account for the detector response and pile-up effects. Moreover, the convergence time increases to more than two days when including extra computations for the discrimination of $^{11}$C and external $gamma$s. In sharp contrast, with the GPU-based fitter it takes less than 10 seconds and less than four minutes, respectively. This fitter is developed utilizing the GooFit project with customized likelihoods, pdfs and infrastructures supporting certain analysis methods. In this proceeding the design of the package, developed features and the comparison with the original CPU fitter are presented.

Data Analysis Statistics and Probability High Energy Physics - Experiment Computational Physics