Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Extreme-Scale Density Functional Theory High Performance Computing of DGDFT for Tens of Thousands of Atoms using Millions of Cores on Sunway TaihuLight

80 0 0.0 ( 0 )

Download Cite

Added by Wei Hu

Publication date 2020

fields Physics

and research's language is English

Authors Wei Hu - Xinming Qin - Caiqing Jiang

Computational Physics Chemical Physics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

High performance computing (HPC) is a powerful tool to accelerate the Kohn-Sham density functional theory (KS-DFT) calculations on modern heterogeneous supercomputers. Here, we describe a massively extreme-scale parallel and portable implementation of discontinuous Galerkin density functional theory (DGDFT) method on the Sunway TaihuLight supercomputer. The DGDFT method uses the adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field (SCF) iteration to solve the KS equations with the high precision comparable to that of plane-wave basis set. In particular, the DGDFT method adopts a two-level parallelization strategy that makes use of different types of data distribution, task scheduling, and data communication schemes, and combines with the feature of master-slave multi-thread heterogeneous parallelism of SW26010 processor, resulting in extreme-scale HPC KS-DFT calculations on the Sunway TaihuLight supercomputer. We show that the DGDFT method can scale up to 8,519,680 processing cores (131,072 core groups) on the Sunway TaihuLight supercomputer for investigating the electronic structures of two-dimensional (2D) metallic graphene systems containing tens of thousands of carbon atoms.

rate research

Real-space formulation of the stress tensor for $mathcal{O}(N)$ density functional theory: application to high temperature calculations

131 - Abhiraj Sharma , Sebastien Hamel , Mandy Bethkenhagen 2020

We present an accurate and efficient real-space formulation of the Hellmann-Feynman stress tensor for $mathcal{O}(N)$ Kohn-Sham density functional theory (DFT). While applicable at any temperature, the formulation is most efficient at high temperature where the Fermi-Dirac distribution becomes smoother and density matrix becomes correspondingly more localized. We first rewrite the orbital-dependent stress tensor for real-space DFT in terms of the density matrix, thereby making it amenable to $mathcal{O}(N)$ methods. We then describe its evaluation within the $mathcal{O}(N)$ infinite-cell Clenshaw-Curtis Spectral Quadrature (SQ) method, a technique that is applicable to metallic as well as insulating systems, is highly parallelizable, becomes increasingly efficient with increasing temperature, and provides results corresponding to the infinite crystal without the need of Brillouin zone integration. We demonstrate systematic convergence of the resulting formulation with respect to SQ parameters to exact diagonalization results, and show convergence with respect to mesh size to established planewave results. We employ the new formulation to compute the viscosity of hydrogen at a million kelvin from Kohn-Sham quantum molecular dynamics, where we find agreement with previous more approximate orbital-free density functional methods.

Computational Physics Chemical Physics

Benchmarking 50-Photon Gaussian Boson Sampling on the Sunway TaihuLight

106 - Yuxuan Li , Mingcheng Chen , Yaojian Chen 2020

Boson sampling is expected to be one of an important milestones that will demonstrate quantum supremacy. The present work establishes the benchmarking of Gaussian boson sampling (GBS) with threshold detection based on the Sunway TaihuLight supercomputer. To achieve the best performance and provide a competitive scenario for future quantum computing studies, the selected simulation algorithm is fully optimized based on a set of innovative approaches, including a parallel scheme and instruction-level optimizing method. Furthermore, data precision and instruction scheduling are handled in a sophisticated manner by an adaptive precision optimization scheme and a DAG-based heuristic search algorithm, respectively. Based on these methods, a highly efficient and parallel quantum sampling algorithm is designed. The largest run enables us to obtain one Torontonian function of a 100 x 100 submatrix from 50-photon GBS within 20 hours in 128-bit precision and 2 days in 256-bit precision.

Distributed Parallel and Cluster Computing Quantum Physics

Enabling Large-Scale Condensed-Phase Hybrid Density Functional Theory Based $Ab$ $Initio$ Molecular Dynamics I: Theory, Algorithm, and Performance

173 - Hsin-Yu Ko , Junteng Jia , Biswajit Santra 2019

By including a fraction of exact exchange (EXX), hybrid functionals reduce the self-interaction error in semi-local density functional theory (DFT), and thereby furnish a more accurate and reliable description of the electronic structure in systems throughout biology, chemistry, physics, and materials science. However, the high computational cost associated with the evaluation of all required EXX quantities has limited the applicability of hybrid DFT in the treatment of large molecules and complex condensed-phase materials. To overcome this limitation, we have devised a linear-scaling yet formally exact approach that utilizes a local representation of the occupied orbitals (e.g., maximally localized Wannier functions, MLWFs) to exploit the sparsity in the real-space evaluation of the quantum mechanical exchange interaction in finite-gap systems. In this work, we present a detailed description of the theoretical and algorithmic advances required to perform MLWF-based ab initio molecular dynamics (AIMD) simulations of large-scale condensed-phase systems at the hybrid DFT level. We provide a comprehensive description of the exx algorithm, which is currently implemented in the Quantum ESPRESSO program and employs a hybrid MPI/OpenMP parallelization scheme to efficiently utilize high-performance computing (HPC) resources. This is followed by a critical assessment of the accuracy and parallel performance of this approach when performing AIMD simulations of liquid water in the canonical ensemble. With access to HPC resources, we demonstrate that exx enables hybrid DFT based AIMD simulations of condensed-phase systems containing 500-1000 atoms with a walltime cost that is comparable to semi-local DFT. In doing so, exx takes us closer to routinely performing AIMD simulations of large-scale condensed-phase systems for sufficiently long timescales at the hybrid DFT level of theory.

Computational Physics Materials Science

MD Simulation of Hundred-Billion-Metal-Atom Cascade Collision on Sunway Taihulight

123 - Genshen Chu , Yang Li , Runchu Zhao 2021

Radiation damage to the steel material of reactor pressure vessels is a major threat to the nuclear reactor safety. It is caused by the metal atom cascade collision, initialized when the atoms are struck by a high-energy neutron. The paper presents MISA-MD, a new implementation of molecular dynamics, to simulate such cascade collision with EAM potential. MISA-MD realizes (1) a hash-based data structure to efficiently store an atom and find its neighbors, and (2) several acceleration and optimization strategies based on SW26010 processor of Sunway Taihulight supercomputer, including an efficient potential table storage and interpolation method, a coloring method to avoid write conflicts, and double-buffer and data reuse strategies. The experimental results demonstrated that MISA-MD has good accuracy and scalability, and obtains a parallel efficiency of over 79% in an 655-billion-atom system. Compared with a state-of-the-art MD program LAMMPS, MISA-MD requires less memory usage and achieves better computational performance.

Distributed Parallel and Cluster Computing

Fast real-time time-dependent density functional theory calculations with the parallel transport gauge

293 - Weile Jia , Dong An , Lin-Wang Wang 2018

Real-time time-dependent density functional theory (RT-TDDFT) is known to be hindered by the very small time step (attosecond or smaller) needed in the numerical simulation due to the fast oscillation of electron wavefunctions, which significantly limits its range of applicability for the study of ultrafast dynamics. In this paper, we demonstrate that such oscillation can be considerably reduced by optimizing the gauge choice using the parallel transport formalism. RT-TDDFT calculations can thus be significantly accelerated using a combination of the parallel transport gauge and implicit integrators, and the resulting scheme can be used to accelerate any electronic structure software that uses a Schrodinger representation. Using absorption spectrum, ultrashort laser pulse, and Ehrenfest dynamics calculations for example, we show that the new method can utilize a time step that is on the order of $10sim 100$ attoseconds in a planewave basis set, and is no less than $5sim 10$ times faster when compared to the standard explicit 4th order Runge-Kutta time integrator. Thanks to the significant increase of the size of the time step, we also demonstrate that the new method is more than 10 times faster in terms of the wall clock time when compared to the standard explicit 4th order Runge-Kutta time integrator for silicon systems ranging from 32 to 1024 atoms

Computational Physics Chemical Physics

comments

Fetching comments

Peninsula Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Extreme-Scale Density Functional Theory High Performance Computing of DGDFT for Tens of Thousands of Atoms using Millions of Cores on Sunway TaihuLight

Ask ChatGPT about the research

No Arabic abstract

Read More