In micromagnetic simulations, the demagnetization field is by far the computationally most expensive field component and often a limiting factor in large multilayer systems. We present an exact method to calculate the demagnetization field of magnetic layers with arbitrary thicknesses. In this approach we combine the widely used fast-Fourier-transform based circular convolution method with an explicit convolution using a generalized form of the Newell formulas. We implement the method both for central processors and graphics processors and find that significant speedups for irregular multilayer geometries can be achieved. Using this method we optimize the geometry of a magnetic random-access memory cell by varying a single specific layer thickness and simulate a hysteresis curve to determine the resulting switching field.
This work presents a dynamic parallel distribution scheme for the Hartree-Fock exchange~(HFX) calculations based on the real-space NAO2GTO framework. The most time-consuming electron repulsion integrals~(ERIs) calculation is perfectly load-balanced with 2-level master-worker dynamic parallel scheme, the density matrix and the HFX matrix are both stored in the sparse format, the network communication time is minimized via only communicating the index of the batched ERIs and the final sparse matrix form of the HFX matrix. The performance of this dynamic scalable distributed algorithm has been demonstrated by several examples of large scale hybrid density-functional calculations on Tianhe-2 supercomputers, including both molecular and solid states systems with multiple dimensions, and illustrates good scalability.
We propose a harmonic surface mapping algorithm (HSMA) for electrostatic pairwise sums of an infinite number of image charges. The images are induced by point sources within a box due to a specific boundary condition which can be non-periodic. The HSMA first introduces an auxiliary surface such that the contribution of images outside the surface can be approximated by the least-squares method using spherical harmonics as basis functions. The so-called harmonic surface mapping is the procedure to transform the approximate solution into a surface charge and a surface dipole over the auxiliary surface, which becomes point images by using numerical integration. The mapping procedure is independent of the number of the sources and is considered to have a low complexity. The electrostatic interactions are then among those charges within the surface and at the integration points, which are all the form of Coulomb potential and can be accelerated straightforwardly by the fast multipole method to achieve linear scaling. Numerical calculations of the Madelung constant of a crystalline lattice, electrostatic energy of ions in a metallic cavity, and the time performance for large-scale systems show that the HSMA is accurate and fast, and thus is attractive for many applications.
Real-time time-dependent density functional theory (rt-TDDFT) with hybrid exchange-correlation functional has wide-ranging applications in chemistry and material science simulations. However, it can be thousands of times more expensive than a conventional ground state DFT simulation, hence is limited to small systems. In this paper, we accelerate hybrid functional rt-TDDFT calculations using the parallel transport gauge formalism, and the GPU implementation on Summit. Our implementation can efficiently scale to 786 GPUs for a large system with 1536 silicon atoms, and the wall clock time is only 1.5 hours per femtosecond. This unprecedented speed enables the simulation of large systems with more than 1000 atoms using rt-TDDFT and hybrid functional.
We present a new method to accelerate real time-time dependent density functional theory (rt-TDDFT) calculations with hybrid exchange-correlation functionals. For large basis set, the computational bottleneck for large scale calculations is the application of the Fock exchange operator to the time-dependent orbitals. Our main goal is to reduce the frequency of applying the Fock exchange operator, without loss of accuracy. We achieve this by combining the recently developed parallel transport (PT) gauge formalism and the adaptively compressed exchange operator (ACE) formalism. The PT gauge yields the slowest possible dynamics among all choices of gauge. When coupled with implicit time integrators such as the Crank-Nicolson (CN) scheme, the resulting PT-CN scheme can significantly increase the time step from sub-attoseconds to 10-100 attoseconds. At each time step $t_{n}$, PT-CN requires the self-consistent solution of the orbitals at time $t_{n+1}$. We use ACE to delay the update of the Fock exchange operator in this nonlinear system, while maintaining the same self-consistent solution. We verify the performance of the resulting PT-CN-ACE method by computing the absorption spectrum of a benzene molecule and the response of bulk silicon systems to an ultrafast laser pulse, using the planewave basis set and the HSE functional. We report the strong and weak scaling of the PT-CN-ACE method for silicon systems ranging from 32 to 1024 atoms, with up to 2048 computational cores. Compared to standard explicit time integrators such as the 4th order Runge-Kutta method (RK4), the PT-CN-ACE can reduce the Fock exchange operator application by nearly 70 times, thus reduce the overall wall clock time time by 46 times for the system with 1024 atoms. Hence our work enables hybrid functional rt-TDDFT calculations to be routinely performed with a large basis set for the first time.
We present a fast and efficient hybrid algorithm for selecting exoplanetary candidates from wide-field transit surveys. Our method is based on the widely-used SysRem and Box Least-Squares (BLS) algorithms. Patterns of systematic error that are common to all stars on the frame are mapped and eliminated using the SysRem algorithm. The remaining systematic errors caused by spatially localised flat-fielding and other errors are quantified using a boxcar-smoothing method. We show that the dimensions of the search-parameter space can be reduced greatly by carrying out an initial BLS search on a coarse grid of reduced dimensions, followed by Newton-Raphson refinement of the transit parameters in the vicinity of the most significant solutions. We illustrate the methods operation by applying it to data from one field of the SuperWASP survey, comprising 2300 observations of 7840 stars brighter than V=13.0. We identify 11 likely transit candidates. We reject stars that exhibit significant ellipsoidal variations indicative of a stellar-mass companion. We use colours and proper motions from the 2MASS and USNO-B1.0 surveys to estimate the stellar parameters and the companion radius. We find that two stars showing unambiguous transit signals pass all these tests, and so qualify for detailed high-resolution spectroscopic follow-up.
Log in to be able to interact and post comments
comments
Fetching comments
Sorry, something went wrong while fetching comments!
Paul Heistracher
,Florian Bruckner
,Claas Abert
.
(2019)
.
"Hybrid FFT algorithm for fast demagnetization field calculations on non-equidistant magnetic layers"
.
Paul Heistracher
هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا