ﻻ يوجد ملخص باللغة العربية
We study the algorithmic optimization and performance tuning of the Lattice QCD clover-fermion solver for the K computer. We implement the Luschers SAP preconditioner with sub-blocking in which the lattice block in a node is further divided to several sub-blocks to extract enough parallelism for the 8-core CPU SPARC64$^{mathrm{TM}}$ VIIIfx of the K computer. To achieve a better convergence property we use the symmetric successive over-relaxation (SSOR) iteration with {it locally-lexicographical} ordering for the sub-blocks in obtaining the block inverse. The SAP preconditioner is included in the single precision BiCGStab solver of the nested BiCGStab solver. The single precision part of the computational kernel are solely written with the SIMD oriented intrinsics to achieve the best performance of the SPARC on the K computer. We benchmark the single precision BiCGStab solver on the three lattice sizes: $12^3times 24$, $24^3times 48$ and $48^3times 96$, with fixing the local lattice size in a node at $6^3times 12$. We observe an ideal weak-scaling performance from 16 nodes to 4096 nodes. The performance of a computational kernel exceeds 50% efficiency, and the single precision BiCGstab has $sim26% susutained efficiency.
We port Domain-Decomposed-alpha-AMG solver to the K computer. The system has 8 cores and 16 GB memory per node, of which theoretical peak is 128 GFlops (82,944 nodes in total). Its feature, as many as 256 registers per core and as large as 0.5 byte/F
The JOREK extended magneto-hydrodynamic (MHD) code is a widely used simulation code for studying the non-linear dynamics of large-scale instabilities in divertor tokamak plasmas. Due to the large scale-separation intrinsic to these phenomena both in
We present results for application of block BiCGSTAB algorithm modified by the QR decomposition and the SAP preconditioner to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD on a $32^3 times 64$ lattice at almost physical quar
We study the performance of all-mode-averaging (AMA) when used in conjunction with a locally deflated SAP-preconditioned solver, determining how to optimize the local block sizes and number of deflation fields in order to minimize the computational c
Systems with the quantum numbers of up to twelve charged and neutral pseudoscalar mesons, as well as one-, two-, and three-nucleon systems, are studied using dynamical lattice quantum chromodynamics and quantum electrodynamics (QCD+QED) calculations