No Arabic abstract
High performance computing (HPC) is a powerful tool to accelerate the Kohn-Sham density functional theory (KS-DFT) calculations on modern heterogeneous supercomputers. Here, we describe a massively extreme-scale parallel and portable implementation of discontinuous Galerkin density functional theory (DGDFT) method on the Sunway TaihuLight supercomputer. The DGDFT method uses the adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field (SCF) iteration to solve the KS equations with the high precision comparable to that of plane-wave basis set. In particular, the DGDFT method adopts a two-level parallelization strategy that makes use of different types of data distribution, task scheduling, and data communication schemes, and combines with the feature of master-slave multi-thread heterogeneous parallelism of SW26010 processor, resulting in extreme-scale HPC KS-DFT calculations on the Sunway TaihuLight supercomputer. We show that the DGDFT method can scale up to 8,519,680 processing cores (131,072 core groups) on the Sunway TaihuLight supercomputer for investigating the electronic structures of two-dimensional (2D) metallic graphene systems containing tens of thousands of carbon atoms.
We present an accurate and efficient real-space formulation of the Hellmann-Feynman stress tensor for $mathcal{O}(N)$ Kohn-Sham density functional theory (DFT). While applicable at any temperature, the formulation is most efficient at high temperature where the Fermi-Dirac distribution becomes smoother and density matrix becomes correspondingly more localized. We first rewrite the orbital-dependent stress tensor for real-space DFT in terms of the density matrix, thereby making it amenable to $mathcal{O}(N)$ methods. We then describe its evaluation within the $mathcal{O}(N)$ infinite-cell Clenshaw-Curtis Spectral Quadrature (SQ) method, a technique that is applicable to metallic as well as insulating systems, is highly parallelizable, becomes increasingly efficient with increasing temperature, and provides results corresponding to the infinite crystal without the need of Brillouin zone integration. We demonstrate systematic convergence of the resulting formulation with respect to SQ parameters to exact diagonalization results, and show convergence with respect to mesh size to established planewave results. We employ the new formulation to compute the viscosity of hydrogen at a million kelvin from Kohn-Sham quantum molecular dynamics, where we find agreement with previous more approximate orbital-free density functional methods.
Boson sampling is expected to be one of an important milestones that will demonstrate quantum supremacy. The present work establishes the benchmarking of Gaussian boson sampling (GBS) with threshold detection based on the Sunway TaihuLight supercomputer. To achieve the best performance and provide a competitive scenario for future quantum computing studies, the selected simulation algorithm is fully optimized based on a set of innovative approaches, including a parallel scheme and instruction-level optimizing method. Furthermore, data precision and instruction scheduling are handled in a sophisticated manner by an adaptive precision optimization scheme and a DAG-based heuristic search algorithm, respectively. Based on these methods, a highly efficient and parallel quantum sampling algorithm is designed. The largest run enables us to obtain one Torontonian function of a 100 x 100 submatrix from 50-photon GBS within 20 hours in 128-bit precision and 2 days in 256-bit precision.
By including a fraction of exact exchange (EXX), hybrid functionals reduce the self-interaction error in semi-local density functional theory (DFT), and thereby furnish a more accurate and reliable description of the electronic structure in systems throughout biology, chemistry, physics, and materials science. However, the high computational cost associated with the evaluation of all required EXX quantities has limited the applicability of hybrid DFT in the treatment of large molecules and complex condensed-phase materials. To overcome this limitation, we have devised a linear-scaling yet formally exact approach that utilizes a local representation of the occupied orbitals (e.g., maximally localized Wannier functions, MLWFs) to exploit the sparsity in the real-space evaluation of the quantum mechanical exchange interaction in finite-gap systems. In this work, we present a detailed description of the theoretical and algorithmic advances required to perform MLWF-based ab initio molecular dynamics (AIMD) simulations of large-scale condensed-phase systems at the hybrid DFT level. We provide a comprehensive description of the exx algorithm, which is currently implemented in the Quantum ESPRESSO program and employs a hybrid MPI/OpenMP parallelization scheme to efficiently utilize high-performance computing (HPC) resources. This is followed by a critical assessment of the accuracy and parallel performance of this approach when performing AIMD simulations of liquid water in the canonical ensemble. With access to HPC resources, we demonstrate that exx enables hybrid DFT based AIMD simulations of condensed-phase systems containing 500-1000 atoms with a walltime cost that is comparable to semi-local DFT. In doing so, exx takes us closer to routinely performing AIMD simulations of large-scale condensed-phase systems for sufficiently long timescales at the hybrid DFT level of theory.
Radiation damage to the steel material of reactor pressure vessels is a major threat to the nuclear reactor safety. It is caused by the metal atom cascade collision, initialized when the atoms are struck by a high-energy neutron. The paper presents MISA-MD, a new implementation of molecular dynamics, to simulate such cascade collision with EAM potential. MISA-MD realizes (1) a hash-based data structure to efficiently store an atom and find its neighbors, and (2) several acceleration and optimization strategies based on SW26010 processor of Sunway Taihulight supercomputer, including an efficient potential table storage and interpolation method, a coloring method to avoid write conflicts, and double-buffer and data reuse strategies. The experimental results demonstrated that MISA-MD has good accuracy and scalability, and obtains a parallel efficiency of over 79% in an 655-billion-atom system. Compared with a state-of-the-art MD program LAMMPS, MISA-MD requires less memory usage and achieves better computational performance.
Real-time time-dependent density functional theory (RT-TDDFT) is known to be hindered by the very small time step (attosecond or smaller) needed in the numerical simulation due to the fast oscillation of electron wavefunctions, which significantly limits its range of applicability for the study of ultrafast dynamics. In this paper, we demonstrate that such oscillation can be considerably reduced by optimizing the gauge choice using the parallel transport formalism. RT-TDDFT calculations can thus be significantly accelerated using a combination of the parallel transport gauge and implicit integrators, and the resulting scheme can be used to accelerate any electronic structure software that uses a Schrodinger representation. Using absorption spectrum, ultrashort laser pulse, and Ehrenfest dynamics calculations for example, we show that the new method can utilize a time step that is on the order of $10sim 100$ attoseconds in a planewave basis set, and is no less than $5sim 10$ times faster when compared to the standard explicit 4th order Runge-Kutta time integrator. Thanks to the significant increase of the size of the time step, we also demonstrate that the new method is more than 10 times faster in terms of the wall clock time when compared to the standard explicit 4th order Runge-Kutta time integrator for silicon systems ranging from 32 to 1024 atoms