ترغب بنشر مسار تعليمي؟ اضغط هنا

Fast real-time time-dependent hybrid functional calculations with the parallel transport gauge and the adaptively compressed exchange formulation

273   0   0.0 ( 0 )
 نشر من قبل Weile Jia
 تاريخ النشر 2018
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

We present a new method to accelerate real time-time dependent density functional theory (rt-TDDFT) calculations with hybrid exchange-correlation functionals. For large basis set, the computational bottleneck for large scale calculations is the application of the Fock exchange operator to the time-dependent orbitals. Our main goal is to reduce the frequency of applying the Fock exchange operator, without loss of accuracy. We achieve this by combining the recently developed parallel transport (PT) gauge formalism and the adaptively compressed exchange operator (ACE) formalism. The PT gauge yields the slowest possible dynamics among all choices of gauge. When coupled with implicit time integrators such as the Crank-Nicolson (CN) scheme, the resulting PT-CN scheme can significantly increase the time step from sub-attoseconds to 10-100 attoseconds. At each time step $t_{n}$, PT-CN requires the self-consistent solution of the orbitals at time $t_{n+1}$. We use ACE to delay the update of the Fock exchange operator in this nonlinear system, while maintaining the same self-consistent solution. We verify the performance of the resulting PT-CN-ACE method by computing the absorption spectrum of a benzene molecule and the response of bulk silicon systems to an ultrafast laser pulse, using the planewave basis set and the HSE functional. We report the strong and weak scaling of the PT-CN-ACE method for silicon systems ranging from 32 to 1024 atoms, with up to 2048 computational cores. Compared to standard explicit time integrators such as the 4th order Runge-Kutta method (RK4), the PT-CN-ACE can reduce the Fock exchange operator application by nearly 70 times, thus reduce the overall wall clock time time by 46 times for the system with 1024 atoms. Hence our work enables hybrid functional rt-TDDFT calculations to be routinely performed with a large basis set for the first time.



قيم البحث

اقرأ أيضاً

Real-time time-dependent density functional theory (RT-TDDFT) is known to be hindered by the very small time step (attosecond or smaller) needed in the numerical simulation due to the fast oscillation of electron wavefunctions, which significantly li mits its range of applicability for the study of ultrafast dynamics. In this paper, we demonstrate that such oscillation can be considerably reduced by optimizing the gauge choice using the parallel transport formalism. RT-TDDFT calculations can thus be significantly accelerated using a combination of the parallel transport gauge and implicit integrators, and the resulting scheme can be used to accelerate any electronic structure software that uses a Schrodinger representation. Using absorption spectrum, ultrashort laser pulse, and Ehrenfest dynamics calculations for example, we show that the new method can utilize a time step that is on the order of $10sim 100$ attoseconds in a planewave basis set, and is no less than $5sim 10$ times faster when compared to the standard explicit 4th order Runge-Kutta time integrator. Thanks to the significant increase of the size of the time step, we also demonstrate that the new method is more than 10 times faster in terms of the wall clock time when compared to the standard explicit 4th order Runge-Kutta time integrator for silicon systems ranging from 32 to 1024 atoms
Real-time time-dependent density functional theory (rt-TDDFT) with hybrid exchange-correlation functional has wide-ranging applications in chemistry and material science simulations. However, it can be thousands of times more expensive than a convent ional ground state DFT simulation, hence is limited to small systems. In this paper, we accelerate hybrid functional rt-TDDFT calculations using the parallel transport gauge formalism, and the GPU implementation on Summit. Our implementation can efficiently scale to 786 GPUs for a large system with 1536 silicon atoms, and the wall clock time is only 1.5 hours per femtosecond. This unprecedented speed enables the simulation of large systems with more than 1000 atoms using rt-TDDFT and hybrid functional.
This work presents a dynamic parallel distribution scheme for the Hartree-Fock exchange~(HFX) calculations based on the real-space NAO2GTO framework. The most time-consuming electron repulsion integrals~(ERIs) calculation is perfectly load-balanced w ith 2-level master-worker dynamic parallel scheme, the density matrix and the HFX matrix are both stored in the sparse format, the network communication time is minimized via only communicating the index of the batched ERIs and the final sparse matrix form of the HFX matrix. The performance of this dynamic scalable distributed algorithm has been demonstrated by several examples of large scale hybrid density-functional calculations on Tianhe-2 supercomputers, including both molecular and solid states systems with multiple dimensions, and illustrates good scalability.
Hybrid density-functional calculation is one of the most commonly adopted electronic structure theory used in computational chemistry and materials science because of its balance between accuracy and computational cost. Recently, we have developed a novel scheme called NAO2GTO to achieve linear scaling (Order-N) calculations for hybrid density-functionals. In our scheme, the most time-consuming step is the calculation of the electron repulsion integrals (ERIs) part. So how to create an even distribution of these ERIs in parallel implementation is an issue of particular importance. Here, we present two static scalable distributed algorithms for the ERIs computation. Firstly, the ERIs are distributed over ERIs shell pairs. Secondly, the ERIs is distributed over ERIs shell quartets. In both algorithms, the calculation of ERIs is independent of each other, so the communication time is minimized. We show our speedup results to demonstrate the performance of these static parallel distributed algorithms in the Hefei Order-N packages for textit{ab initio} simulations (HONPAS).
Electron tomography has achieved higher resolution and quality at reduced doses with recent advances in compressed sensing. Compressed sensing (CS) theory exploits the inherent sparse signal structure to efficiently reconstruct three-dimensional (3D) volumes at the nanoscale from undersampled measurements. However, the process bottlenecks 3D reconstruction with computation times that run from hours to days. Here we demonstrate a framework for dynamic compressed sensing that produces a 3D specimen structure that updates in real-time as new specimen projections are collected. Researchers can begin interpreting 3D specimens as data is collected to facilitate high-throughput and interactive analysis. Using scanning transmission electron microscopy (STEM), we show that dynamic compressed sensing accelerates the convergence speed by 3-fold while also reducing its error by 27% for an Au/SrTiO3 nanoparticle specimen. Before a tomography experiment is completed, the 3D tomogram has interpretable structure within 33% of completion and fine details are visible as early as 66%. Upon completion of an experiment, a high-fidelity 3D visualization is produced without further delay. Additionally, reconstruction parameters that tune data fidelity can be manipulated throughout the computation without rerunning the entire process.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا