ترغب بنشر مسار تعليمي؟ اضغط هنا

Multi-block/multi-core SSOR preconditioner for the QCD quark solver for K computer

372   0   0.0 ( 0 )
 نشر من قبل Ken-Ichi Ishikawa
 تاريخ النشر 2012
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the algorithmic optimization and performance tuning of the Lattice QCD clover-fermion solver for the K computer. We implement the Luschers SAP preconditioner with sub-blocking in which the lattice block in a node is further divided to several sub-blocks to extract enough parallelism for the 8-core CPU SPARC64$^{mathrm{TM}}$ VIIIfx of the K computer. To achieve a better convergence property we use the symmetric successive over-relaxation (SSOR) iteration with {it locally-lexicographical} ordering for the sub-blocks in obtaining the block inverse. The SAP preconditioner is included in the single precision BiCGStab solver of the nested BiCGStab solver. The single precision part of the computational kernel are solely written with the SIMD oriented intrinsics to achieve the best performance of the SPARC on the K computer. We benchmark the single precision BiCGStab solver on the three lattice sizes: $12^3times 24$, $24^3times 48$ and $48^3times 96$, with fixing the local lattice size in a node at $6^3times 12$. We observe an ideal weak-scaling performance from 16 nodes to 4096 nodes. The performance of a computational kernel exceeds 50% efficiency, and the single precision BiCGstab has $sim26% susutained efficiency.



قيم البحث

اقرأ أيضاً

We port Domain-Decomposed-alpha-AMG solver to the K computer. The system has 8 cores and 16 GB memory per node, of which theoretical peak is 128 GFlops (82,944 nodes in total). Its feature, as many as 256 registers per core and as large as 0.5 byte/F lop ratio, requires a different tuning from other machines. In order to use more registers, we change some of the data structure and rewrite matrix-vector operations with intrinsics. The performance is improved by more than a factor two for twelve solves including the setup. The efficiency is still about 5% after the optimization, which is lower than a previously tuned mixed precision solver for the K computer, 22%. The throughput is, however, more than two times better for a physical point configuration.
343 - I Holod , M Hoelzl , P S Verma 2021
The JOREK extended magneto-hydrodynamic (MHD) code is a widely used simulation code for studying the non-linear dynamics of large-scale instabilities in divertor tokamak plasmas. Due to the large scale-separation intrinsic to these phenomena both in space and time, the computational costs for simulations in realistic geometry and with realistic parameters can be very high, motivating the investment of considerable effort for optimization. In this article, a set of developments regarding the JOREK solver and preconditioner is described, which lead to overall significant benefits for large production simulations. This comprises in particular enhanced convergence in highly non-linear scenarios and a general reduction of memory consumption and computational costs. The developments include faster construction of preconditioner matrices, a domain decomposition of preconditioning matrices for solver libraries that can handle distributed matrices, interfaces for additional solver libraries, an option to use matrix compression methods, and the implementation of a complex solver interface for the preconditioner. The most significant development presented consists in a generalization of the physics based preconditioner to mode groups, which allows to account for the dominant interactions between toroidal Fourier modes in highly non-linear simulations. At the cost of a moderate increase of memory consumption, the technique can strongly enhance convergence in suitable cases allowing to use significantly larger time steps. For all developments, benchmarks based on typical simulation cases demonstrate the resulting improvements.
We present results for application of block BiCGSTAB algorithm modified by the QR decomposition and the SAP preconditioner to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD on a $32^3 times 64$ lattice at almost physical quar k masses. The QR decomposition improves convergence behaviors in the block BiCGSTAB algorithm suppressing deviation between true residual and recursive one. The SAP preconditioner applied to the domain-decomposed lattice helps us minimize communication overhead. We find remarkable cost reduction thanks to cache tuning and reduction of number of iterations.
We study the performance of all-mode-averaging (AMA) when used in conjunction with a locally deflated SAP-preconditioned solver, determining how to optimize the local block sizes and number of deflation fields in order to minimize the computational c ost for a given level of overall statistical accuracy. We find that AMA enables a reduction of the statistical error on nucleon charges by a factor of around two at the same cost when compared to the standard method. As a demonstration, we compute the axial, scalar and tensor charges of the nucleon in $N_f=2$ lattice QCD with non-perturbatively O(a)-improved Wilson quarks, using O(10,000) measurements to pursue the signal out to source-sink separations of $t_ssim 1.5$ fm. Our results suggest that the axial charge is suffering from a significant amount (5-10%) of excited-state contamination at source-sink separations of up to $t_ssim 1.2$ fm, whereas the excited-state contamination in the scalar and tensor charges seems to be small.
Systems with the quantum numbers of up to twelve charged and neutral pseudoscalar mesons, as well as one-, two-, and three-nucleon systems, are studied using dynamical lattice quantum chromodynamics and quantum electrodynamics (QCD+QED) calculations and effective field theory. QED effects on hadronic interactions are determined by comparing systems of charged and neutral hadrons after tuning the quark masses to remove strong isospin breaking effects. A non-relativistic effective field theory, which perturbatively includes finite-volume Coulomb effects, is analyzed for systems of multiple charged hadrons and found to accurately reproduce the lattice QCD+QED results. QED effects on charged multi-hadron systems beyond Coulomb photon exchange are determined by comparing the two- and three-body interaction parameters extracted from the lattice QCD+QED results for charged and neutral multi-hadron systems.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا