ﻻ يوجد ملخص باللغة العربية
The FFT of three-dimensional (3D) input data is an important computational kernel of numerical simulations and is widely used in High Performance Computing (HPC) codes running on a large number of processors. Performance of many scientific applications such as Molecular Dynamic simulations depends on the underlying 3D parallel FFT library being used. In this paper, we present C-DACs three-dimensional Fast Fourier Transform (CROFT) library which implements three-dimensional parallel FFT using pencil decomposition. To exploit the hyperthreading capabilities of processor cores without affecting performance, CROFT is designed to use multithreading along with MPI. CROFT implementation has an innovative feature of overlapping compute and memory-I/O with MPI communication using multithreading. As opposed to other 3D FFT implementations, CROFT uses only two threads where one thread is dedicated for communication so that it can be effectively overlapped with computations. Thus, depending on the number of processes used, CROFT achieves performance improvement of about 51% to 42% as compared to FFTW3 library.
Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite the advantages of modularity and elasticity microservices offer, they also complicate cluster management and perfo
Performance and energy are the two most important objectives for optimisation on modern parallel platforms. Latest research demonstrated the importance of workload distribution as a decision variable in the bi-objective optimisation for performance a
The movement of large-scale (tens of Terabytes and larger) data sets between high performance computing (HPC) facilities is an important and increasingly critical capability. A growing number of scientific collaborations rely on HPC facilities for ta
Bayesian optimization (BO) is a flexible and powerful framework that is suitable for computationally expensive simulation-based applications and guarantees statistical convergence to the global optimum. While remaining as one of the most popular opti
In this paper we demonstrate the methodology for parallelizing the computation of large one-dimensional discrete fast Fourier transforms (DFFTs) on multi-core Intel Xeon processors. DFFTs based on the recursive Cooley-Tukey method have to control cac