Do you want to publish a course? Click here

Parallel Transport Time-Dependent Density Functional Theory Calculations with Hybrid Functional on Summit

101   0   0.0 ( 0 )
 Added by Weile Jia
 Publication date 2019
  fields Physics
and research's language is English




Ask ChatGPT about the research

Real-time time-dependent density functional theory (rt-TDDFT) with hybrid exchange-correlation functional has wide-ranging applications in chemistry and material science simulations. However, it can be thousands of times more expensive than a conventional ground state DFT simulation, hence is limited to small systems. In this paper, we accelerate hybrid functional rt-TDDFT calculations using the parallel transport gauge formalism, and the GPU implementation on Summit. Our implementation can efficiently scale to 786 GPUs for a large system with 1536 silicon atoms, and the wall clock time is only 1.5 hours per femtosecond. This unprecedented speed enables the simulation of large systems with more than 1000 atoms using rt-TDDFT and hybrid functional.



rate research

Read More

Real-time time-dependent density functional theory (RT-TDDFT) is known to be hindered by the very small time step (attosecond or smaller) needed in the numerical simulation due to the fast oscillation of electron wavefunctions, which significantly limits its range of applicability for the study of ultrafast dynamics. In this paper, we demonstrate that such oscillation can be considerably reduced by optimizing the gauge choice using the parallel transport formalism. RT-TDDFT calculations can thus be significantly accelerated using a combination of the parallel transport gauge and implicit integrators, and the resulting scheme can be used to accelerate any electronic structure software that uses a Schrodinger representation. Using absorption spectrum, ultrashort laser pulse, and Ehrenfest dynamics calculations for example, we show that the new method can utilize a time step that is on the order of $10sim 100$ attoseconds in a planewave basis set, and is no less than $5sim 10$ times faster when compared to the standard explicit 4th order Runge-Kutta time integrator. Thanks to the significant increase of the size of the time step, we also demonstrate that the new method is more than 10 times faster in terms of the wall clock time when compared to the standard explicit 4th order Runge-Kutta time integrator for silicon systems ranging from 32 to 1024 atoms
Imaginary-time time-dependent Density functional theory (it-TDDFT) has been proposed as an alternative method for obtaining the ground state within density functional theory (DFT) which avoids some of the difficulties with convergence encountered by the self-consistent-field (SCF) iterative method. It-TDDFT was previously applied to clusters of atoms where it was demonstrated to converge in select cases where SCF had difficulty with convergence. In the present work we implement it-TDDFT propagation for {it periodic systems} by modifying the Quantum ESPRESSO package, which uses a plane-wave basis with multiple $boldsymbol{k}$ points, and has the options of non-collinear and DFT+U calculations using ultra-soft or norm-conserving pseudo potentials. We demonstrate that our implementation of it-TDDFT propagation with multiple $boldsymbol{k}$ points is correct for DFT+U non-collinear calculations and for DFT+U calculations with ultra-soft pseudo potentials. Our implementation of it-TDDFT propagation converges to the exact SCF energy (up to the decimal guaranteed by double precision) in all but one case where it converged to a slightly lower value than SCF, suggesting a useful alternative for systems where SCF has difficulty to reach the Kohn-Sham ground state. In addition, we demonstrate that rapid convergence can be achieved if we use adaptive-size imaginary-time-steps for different kinetic-energy plane-waves.
272 - Weile Jia , Lin Lin 2018
We present a new method to accelerate real time-time dependent density functional theory (rt-TDDFT) calculations with hybrid exchange-correlation functionals. For large basis set, the computational bottleneck for large scale calculations is the application of the Fock exchange operator to the time-dependent orbitals. Our main goal is to reduce the frequency of applying the Fock exchange operator, without loss of accuracy. We achieve this by combining the recently developed parallel transport (PT) gauge formalism and the adaptively compressed exchange operator (ACE) formalism. The PT gauge yields the slowest possible dynamics among all choices of gauge. When coupled with implicit time integrators such as the Crank-Nicolson (CN) scheme, the resulting PT-CN scheme can significantly increase the time step from sub-attoseconds to 10-100 attoseconds. At each time step $t_{n}$, PT-CN requires the self-consistent solution of the orbitals at time $t_{n+1}$. We use ACE to delay the update of the Fock exchange operator in this nonlinear system, while maintaining the same self-consistent solution. We verify the performance of the resulting PT-CN-ACE method by computing the absorption spectrum of a benzene molecule and the response of bulk silicon systems to an ultrafast laser pulse, using the planewave basis set and the HSE functional. We report the strong and weak scaling of the PT-CN-ACE method for silicon systems ranging from 32 to 1024 atoms, with up to 2048 computational cores. Compared to standard explicit time integrators such as the 4th order Runge-Kutta method (RK4), the PT-CN-ACE can reduce the Fock exchange operator application by nearly 70 times, thus reduce the overall wall clock time time by 46 times for the system with 1024 atoms. Hence our work enables hybrid functional rt-TDDFT calculations to be routinely performed with a large basis set for the first time.
Reliable and robust convergence to the electronic ground state within density functional theory (DFT) Kohn-Sham (KS) calculations remains a thorny issue in many systems of interest. In such cases, charge sloshing can delay or completely hinder the convergence. Here, we use an approach based on transforming the time-dependent DFT equations to imaginary time, followed by imaginary-time evolution, as a reliable alternative to the self-consistent field (SCF) procedure for determining the KS ground state. We discuss the theoretical and technical aspects of this approach and show that the KS ground state should be expected to be the long-imaginary-time output of the evolution, independent of the exchange-correlation functional or the level of theory used to simulate the system. By maintaining self-consistency between the single-particle wavefunctions and the electronic density throughout the determination of the stationary state, our method avoids the typical difficulties encountered in SCF. To demonstrate dependability of our approach, we apply it to selected systems which struggle to converge with SCF schemes. In addition, through the van Leeuwen theorem, we affirm the physical meaningfulness of imaginary time TDDFT, justifying its use in certain topics of statistical mechanics such as in computing imaginary time path integrals.
150 - Peize Lin , Xinguo Ren , 2020
We present an efficient, linear-scaling implementation for building the (screened) Hartree-Fock exchange (HFX) matrix for periodic systems within the framework of numerical atomic orbital (NAO) basis functions. Our implementation is based on the localized resolution of the identity approximation by which two-electron Coulomb repulsion integrals can be obtained by only computing two-center quantities -- a feature that is highly beneficial to NAOs. By exploiting the locality of basis functions and efficient prescreening of the intermediate three- and two-index tensors, one can achieve a linear scaling of the computational cost for building the HFX matrix with respect to the system size. Our implementation is massively parallel, thanks to a MPI/OpenMP hybrid parallelization strategy for distributing the computational load and memory storage. All these factors add together to enable highly efficient hybrid functional calculations for large-scale periodic systems. In this work we describe the key algorithms and implementation details for the HFX build as implemented in the ABACUS code package. The performance and scalability of our implementation with respect to the system size and the number of CPU cores are demonstrated for selected benchmark systems up to 4096 atoms.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا