The parallel annealing method is one of the promising approaches for large scale simulations as potentially scalable on any parallel architecture. We present an implementation of the algorithm on the hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible redistributing replicas and to do that efficiently. We provide details of the testing on Intel Skylake/Nvidia V100 based hardware running in parallel more than two million replicas of the Ising model sample. The results are quite optimistic because the acceleration grows toward the perfect line with the growing complexity of the simulated system.
Parallel tempering Monte Carlo has proven to be an efficient method in optimization and sampling applications. Having an optimized temperature set enhances the efficiency of the algorithm through more-frequent replica visits to the temperature limits
. The approaches for finding an optimal temperature set can be divided into two main categories. The methods of the first category distribute the replicas such that the swapping ratio between neighbouring replicas is constant and independent of the temperature values. The second-category techniques including the feedback-optimized method, on the other hand, aim for a temperature distribution that has higher density at simulation bottlenecks, resulting in temperature-dependent replica-exchange probabilities. In this paper, we compare the performance of various temperature setting methods on both sparse and fully-connected spin-glass problems as well as fully-connected Wishart problems that have planted solutions. These include two classes of problems that have either continuous or discontinuous phase transitions in the order parameter. Our results demonstrate that there is no performance advantage for the methods that promote nonuniform swapping probabilities on spin-glass problems where the order parameter has a smooth transition between phases at the critical temperature. However, on Wishart problems that have a first-order phase transition at low temperatures, the feedback-optimized method exhibits a time-to-solution speedup of at least a factor of two over the other approaches.
This work presents a dynamic parallel distribution scheme for the Hartree-Fock exchange~(HFX) calculations based on the real-space NAO2GTO framework. The most time-consuming electron repulsion integrals~(ERIs) calculation is perfectly load-balanced w
ith 2-level master-worker dynamic parallel scheme, the density matrix and the HFX matrix are both stored in the sparse format, the network communication time is minimized via only communicating the index of the batched ERIs and the final sparse matrix form of the HFX matrix. The performance of this dynamic scalable distributed algorithm has been demonstrated by several examples of large scale hybrid density-functional calculations on Tianhe-2 supercomputers, including both molecular and solid states systems with multiple dimensions, and illustrates good scalability.
We present $texttt{Maxent}$, a tool for performing analytic continuation of spectral functions using the maximum entropy method. The code operates on discrete imaginary axis datasets (values with uncertainties) and transforms this input to the real a
xis. The code works for imaginary time and Matsubara frequency data and implements the Legendre representation of finite temperature Greens functions. It implements a variety of kernels, default models, and grids for continuing bosonic, fermionic, anomalous, and other data. Our implementation is licensed under GPLv2 and extensively documented. This paper shows the use of the programs in detail.
We propose a new method for molecular dynamics and Monte Carlo simulations, which is referred to as the replica-permutation method (RPM), to realize more efficient sampling than the replica-exchange method (REM).In RPM not only exchanges between two
replicas but also permutations among more than two replicas are performed. Furthermore, instead of the Metropolis algorithm, the Suwa-Todo algorithm is employed for replica-permutation trials to minimize its rejection ratio. We applied RPM to particles in a double-well potential energy, Met-enkephalin in vacuum, and a C-peptide analog of ribonuclease A in explicit water. For a comparison purposes, replica-exchange molecular dynamics simulations were also performed. As a result, RPM sampled not only the temperature space but also the conformational space more efficiently than REM for all systems. From our simulations of C-peptide, we obtained the alpha-helix structure with salt-bridges between Gly2 and Arg10 which is known in experiments. Calculating its free-energy landscape, the folding pathway was revealed from an extended structure to the alpha-helix structure with the salt-bridges. We found that the folding pathway consists of the two steps: The first step is the salt-bridge formation step, and the second step is the alpha-helix formation step.
We introduce a variant of the Hybrid Monte Carlo (HMC) algorithm to address large-deviation statistics in stochastic hydrodynamics. Based on the path-integral approach to stochastic (partial) differential equations, our HMC algorithm samples space-ti
me histories of the dynamical degrees of freedom under the influence of random noise. First, we validate and benchmark the HMC algorithm by reproducing multiscale properties of the one-dimensional Burgers equation driven by Gaussian and white-in-time noise. Second, we show how to implement an importance sampling protocol to significantly enhance, by orders of magnitudes, the probability to sample extreme and rare events, making it possible to estimate moments of field variables of extremely high order (up to 30 and more). By employing reweighting techniques, we map the biased configurations back to the original probability measure in order to probe their statistical importance. Finally, we show that by biasing the system towards very intense negative gradients, the HMC algorithm is able to explore the statistical fluctuations around instanton configurations. Our results will also be interesting and relevant in lattice gauge theory since they provide insight into reweighting techniques.
Alexander Russkov
,roman Chulkevich
,
.
(2020)
.
"Algorithm for the replica redistribution in the implementation of parallel annealing method on the hybrid supercomputer architecture"
.
Lev N. Shchur
هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا