ﻻ يوجد ملخص باللغة العربية
The predominance of Kohn-Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high performance computing (HPC). With recent trends in HPC leading towards in increasing reliance on heterogeneous accelerator based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high-levels of performance which have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn-Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high-levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.
Molecular dynamics (MD) simulation is a powerful computational tool to study the behavior of macromolecular systems. But many simulations of this field are limited in spatial or temporal scale by the available computational resource. In recent years,
The Graphics Processing Unit (GPU) is a powerful tool for parallel computing. In the past years the performance and capabilities of GPUs have increased, and the Compute Unified Device Architecture (CUDA) - a parallel computing architecture - has been
In this paper, the OpenACC heterogeneous parallel programming model is successfully applied to modification and acceleration of the three-dimensional Tokamak magnetohydrodynamical code (CLTx). Through combination of OpenACC and MPI technologies, CLTx
We train a neural network as the universal exchange-correlation functional of density-functional theory that simultaneously reproduces both the exact exchange-correlation energy and potential. This functional is extremely non-local, but retains the c
We outline how auxiliary-field quantum Monte Carlo (AFQMC) can leverage graphical processing units (GPUs) to accelerate the simulation of solid state sytems. By exploiting conservation of crystal momentum in the one- and two-electron integrals we sho