ﻻ يوجد ملخص باللغة العربية
Nonequispaced discrete Fourier transformation (NDFT) is widely applied in all aspects of computational science and engineering. The computational efficiency and accuracy of NDFT has always been a critical issue in hindering its comprehensive applications both in intensive and in extensive aspects of scientific computing. In our previous work (2018, S.-C. Yang et al., Appl. Comput. Harmon. Anal. 44, 273), a CUNFFT method was proposed and it shown outstanding performance in handling NDFT at intermediate scale based on CUDA (Compute Unified Device Architecture) technology. In the current work, we further improved the computational efficiency of the CUNTTF method using an efficient MPI-CUDA hybrid parallelization (HP) scheme of NFFT to achieve a cutting-edge treatment of NDFT at super extended scale. Within this HP-NFFT method, the spatial domain of NDFT is decomposed into several parts according to the accumulative feature of NDFT and the detailed number of CPU and GPU nodes. These decomposed NDFT subcells are independently calculated on different CPU nodes using a MPI process-level parallelization mode, and on different GPU nodes using a CUDA threadlevel parallelization mode and CUNFFT algorithm. A massive benchmarking of the HP-NFFT method indicates that this method exhibit a dramatic improvement in computational efficiency for handling NDFT at super extended scale without loss of computational precision. Furthermore, the HP-NFFT method is validated via the calculation of Madelung constant of fluorite crystal structure, and thereafter verified that this method is robust for the calculation of electrostatic interactions between charged ions in molecular dynamics simulation systems.
In this work, we present two parallel algorithms for the large-scale discrete Fourier transform (DFT) on Tensor Processing Unit (TPU) clusters. The two parallel algorithms are associated with two formulations of DFT: one is based on the Kronecker pro
With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effo
Getting good performance out of numerical equation solvers requires that the user has provided stable and efficient functions representing their model. However, users should not be trusted to write good code. In this manuscript we describe ModelingTo
Our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software p
MPI derived datatypes are an abstraction that simplifies handling of non-contiguous data in MPI applications. These datatypes are recursively constructed at runtime from primitive Named Types defined in the MPI standard. More recently, the developmen