ترغب بنشر مسار تعليمي؟ اضغط هنا

The static parallel distribution algorithms for hybrid density-functional calculations in HONPAS package

223   0   0.0 ( 0 )
 نشر من قبل Honghui Shang
 تاريخ النشر 2020
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

Hybrid density-functional calculation is one of the most commonly adopted electronic structure theory used in computational chemistry and materials science because of its balance between accuracy and computational cost. Recently, we have developed a novel scheme called NAO2GTO to achieve linear scaling (Order-N) calculations for hybrid density-functionals. In our scheme, the most time-consuming step is the calculation of the electron repulsion integrals (ERIs) part. So how to create an even distribution of these ERIs in parallel implementation is an issue of particular importance. Here, we present two static scalable distributed algorithms for the ERIs computation. Firstly, the ERIs are distributed over ERIs shell pairs. Secondly, the ERIs is distributed over ERIs shell quartets. In both algorithms, the calculation of ERIs is independent of each other, so the communication time is minimized. We show our speedup results to demonstrate the performance of these static parallel distributed algorithms in the Hefei Order-N packages for textit{ab initio} simulations (HONPAS).



قيم البحث

اقرأ أيضاً

This work presents a dynamic parallel distribution scheme for the Hartree-Fock exchange~(HFX) calculations based on the real-space NAO2GTO framework. The most time-consuming electron repulsion integrals~(ERIs) calculation is perfectly load-balanced w ith 2-level master-worker dynamic parallel scheme, the density matrix and the HFX matrix are both stored in the sparse format, the network communication time is minimized via only communicating the index of the batched ERIs and the final sparse matrix form of the HFX matrix. The performance of this dynamic scalable distributed algorithm has been demonstrated by several examples of large scale hybrid density-functional calculations on Tianhe-2 supercomputers, including both molecular and solid states systems with multiple dimensions, and illustrates good scalability.
Real-time time-dependent density functional theory (rt-TDDFT) with hybrid exchange-correlation functional has wide-ranging applications in chemistry and material science simulations. However, it can be thousands of times more expensive than a convent ional ground state DFT simulation, hence is limited to small systems. In this paper, we accelerate hybrid functional rt-TDDFT calculations using the parallel transport gauge formalism, and the GPU implementation on Summit. Our implementation can efficiently scale to 786 GPUs for a large system with 1536 silicon atoms, and the wall clock time is only 1.5 hours per femtosecond. This unprecedented speed enables the simulation of large systems with more than 1000 atoms using rt-TDDFT and hybrid functional.
150 - Peize Lin , Xinguo Ren , 2020
We present an efficient, linear-scaling implementation for building the (screened) Hartree-Fock exchange (HFX) matrix for periodic systems within the framework of numerical atomic orbital (NAO) basis functions. Our implementation is based on the loca lized resolution of the identity approximation by which two-electron Coulomb repulsion integrals can be obtained by only computing two-center quantities -- a feature that is highly beneficial to NAOs. By exploiting the locality of basis functions and efficient prescreening of the intermediate three- and two-index tensors, one can achieve a linear scaling of the computational cost for building the HFX matrix with respect to the system size. Our implementation is massively parallel, thanks to a MPI/OpenMP hybrid parallelization strategy for distributing the computational load and memory storage. All these factors add together to enable highly efficient hybrid functional calculations for large-scale periodic systems. In this work we describe the key algorithms and implementation details for the HFX build as implemented in the ABACUS code package. The performance and scalability of our implementation with respect to the system size and the number of CPU cores are demonstrated for selected benchmark systems up to 4096 atoms.
Real-time time-dependent density functional theory (RT-TDDFT) is known to be hindered by the very small time step (attosecond or smaller) needed in the numerical simulation due to the fast oscillation of electron wavefunctions, which significantly li mits its range of applicability for the study of ultrafast dynamics. In this paper, we demonstrate that such oscillation can be considerably reduced by optimizing the gauge choice using the parallel transport formalism. RT-TDDFT calculations can thus be significantly accelerated using a combination of the parallel transport gauge and implicit integrators, and the resulting scheme can be used to accelerate any electronic structure software that uses a Schrodinger representation. Using absorption spectrum, ultrashort laser pulse, and Ehrenfest dynamics calculations for example, we show that the new method can utilize a time step that is on the order of $10sim 100$ attoseconds in a planewave basis set, and is no less than $5sim 10$ times faster when compared to the standard explicit 4th order Runge-Kutta time integrator. Thanks to the significant increase of the size of the time step, we also demonstrate that the new method is more than 10 times faster in terms of the wall clock time when compared to the standard explicit 4th order Runge-Kutta time integrator for silicon systems ranging from 32 to 1024 atoms
272 - Weile Jia , Lin Lin 2018
We present a new method to accelerate real time-time dependent density functional theory (rt-TDDFT) calculations with hybrid exchange-correlation functionals. For large basis set, the computational bottleneck for large scale calculations is the appli cation of the Fock exchange operator to the time-dependent orbitals. Our main goal is to reduce the frequency of applying the Fock exchange operator, without loss of accuracy. We achieve this by combining the recently developed parallel transport (PT) gauge formalism and the adaptively compressed exchange operator (ACE) formalism. The PT gauge yields the slowest possible dynamics among all choices of gauge. When coupled with implicit time integrators such as the Crank-Nicolson (CN) scheme, the resulting PT-CN scheme can significantly increase the time step from sub-attoseconds to 10-100 attoseconds. At each time step $t_{n}$, PT-CN requires the self-consistent solution of the orbitals at time $t_{n+1}$. We use ACE to delay the update of the Fock exchange operator in this nonlinear system, while maintaining the same self-consistent solution. We verify the performance of the resulting PT-CN-ACE method by computing the absorption spectrum of a benzene molecule and the response of bulk silicon systems to an ultrafast laser pulse, using the planewave basis set and the HSE functional. We report the strong and weak scaling of the PT-CN-ACE method for silicon systems ranging from 32 to 1024 atoms, with up to 2048 computational cores. Compared to standard explicit time integrators such as the 4th order Runge-Kutta method (RK4), the PT-CN-ACE can reduce the Fock exchange operator application by nearly 70 times, thus reduce the overall wall clock time time by 46 times for the system with 1024 atoms. Hence our work enables hybrid functional rt-TDDFT calculations to be routinely performed with a large basis set for the first time.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا