ترغب بنشر مسار تعليمي؟ اضغط هنا

SIMD Vectorization for the Lennard-Jones Potential with AVX2 and AVX-512 instructions

66   0   0.0 ( 0 )
 نشر من قبل Hiroshi Watanabe
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This work describes the SIMD vectorization of the force calculation of the Lennard-Jones potential with Intel AVX2 and AVX-512 instruction sets. Since the force-calculation kernel of the molecular dynamics method involves indirect access to memory, the data layout is one of the most important factors in vectorization. We find that the Array of Structures (AoS) with padding exhibits better performance than Structure of Arrays (SoA) with appropriate vectorization and optimizations. In particular, AoS with 512-bit width exhibits the best performance among the architectures. While the difference in performance between AoS and SoA is significant for the vectorization with AVX2, that with AVX-512 is minor. The effect of other optimization techniques, such as software pipelining together with vectorization, is also discussed. We present results for benchmarks on three CPU architectures: Intel Haswell (HSW), Knights Landing (KNL), and Skylake (SKL). The performance gains by vectorization are about 42% on HSW compared with the code optimized without vectorization. On KNL, the hand-vectorized codes exhibit 34% better performance than the codes vectorized automatically by the Intel compiler. On SKL, the code vectorized with AVX2 exhibits slightly better performance than that with vectorized AVX-512.

قيم البحث

اقرأ أيضاً

We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel AVX-512 architecture. The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that repre sents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of AVX-512, such as using intrinsics and manual prefetching, for the matrix multiplication. Based on experience on the Oakforest-PACS system, a large scale cluster composed of Intel Xeon Phi Knights Landing, we discuss the performance tuning exploiting AVX-512 and code design on the SIMD architecture and massively parallel machines. We observe that the same code runs efficiently on an Intel Xeon Skylake-SP machine.
We publish an extension of openQCD-1.6 with AVX-512 vector instructions using Intel intrinsics. Recent Intel processors support extended instruction sets with operations on 512-bit wide vectors, increasing both the capacity for floating point operati ons and register memory. Optimal use of the new capabilities requires reorganising data and floating point operations into these wider vector units. We report on the implementation and performance of the AVX-512 OpenQCD extension on clusters using Intel Knights Landing and Xeon Scalable (Skylake) CPUs. In complete HMC trajectories with physically relevant parameters we observe a performance increase of 5% to 10%.
Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g., popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g., the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (clang).
133 - D. Coslovich , G. Pastore 2007
We numerically investigated the connection between isobaric fragility and the properties of high-order stationary points of the potential energy surface in different supercooled Lennard-Jones mixtures. The increase of effective activation energies up on supercooling appears to be driven by the increase of average potential energy barriers measured by the energy dependence of the fraction of unstable modes. Such an increase is sharper, the more fragile is the mixture. Correlations between fragility and other properties of high-order stationary points, including the vibrational density of states and the localization features of unstable modes, are also discussed.
This paper investigates the relation between the density-scaling exponent $gamma$ and the virial potential-energy correlation coefficient $R$ at several thermodynamic state points in three dimensions for the generalized $(2n,n)$ Lennard-Jones (LJ) sy stem for $n=4, 9, 12, 18$, as well as for the standard $n=6$ LJ system in two, three, and four dimensions. The state points studied include many low-density states at which the virial potential-energy correlations are not strong. For these state points we find the roughly linear relation $gammacong 3nR/d$ in $d$ dimensions. This result is discussed in light of the approximate extended inverse power law description of generalized LJ potentials [N. P. Bailey et al., J. Chem. Phys. 129, 184508 (2008)]. In the plot of $gamma$ versus $R$ there is in all cases a transition around $Rapprox 0.9$, above which $gamma$ starts to decrease as $R$ approaches unity. This is consistent with the fact that $gammarightarrow 2n/d$ for $Rrightarrow 1$, a limit that is approached at high densities and/or temperatures at which the repulsive $r^{-2n}$ term dominates the physics.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا