Subscribe to the gold package and get unlimited access to Shamra Academy

Multi-block/multi-core SSOR preconditioner for the QCD quark solver for K computer

603 0 0.0 ( 0 )

Download Cite

Added by Ken-Ichi Ishikawa

Publication date 2012

fields Physics

and research's language is English

Authors T. Boku - K.-I. Ishikawa - Y. Kuramashi

High Energy Physics - Lattice Computational Physics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study the algorithmic optimization and performance tuning of the Lattice QCD clover-fermion solver for the K computer. We implement the Luschers SAP preconditioner with sub-blocking in which the lattice block in a node is further divided to several sub-blocks to extract enough parallelism for the 8-core CPU SPARC64$^{mathrm{TM}}$ VIIIfx of the K computer. To achieve a better convergence property we use the symmetric successive over-relaxation (SSOR) iteration with {it locally-lexicographical} ordering for the sub-blocks in obtaining the block inverse. The SAP preconditioner is included in the single precision BiCGStab solver of the nested BiCGStab solver. The single precision part of the computational kernel are solely written with the SIMD oriented intrinsics to achieve the best performance of the SPARC on the K computer. We benchmark the single precision BiCGStab solver on the three lattice sizes: $12^3times 24$, $24^3times 48$ and $48^3times 96$, with fixing the local lattice size in a node at $6^3times 12$. We observe an ideal weak-scaling performance from 16 nodes to 4096 nodes. The performance of a computational kernel exceeds 50% efficiency, and the single precision BiCGstab has $sim26% susutained efficiency.

rate research

Porting DDalphaAMG solver to K computer

54 - Ken-Ichi Ishikawa , Issaku Kanamori 2018

We port Domain-Decomposed-alpha-AMG solver to the K computer. The system has 8 cores and 16 GB memory per node, of which theoretical peak is 128 GFlops (82,944 nodes in total). Its feature, as many as 256 registers per core and as large as 0.5 byte/Flop ratio, requires a different tuning from other machines. In order to use more registers, we change some of the data structure and rewrite matrix-vector operations with intrinsics. The performance is improved by more than a factor two for twelve solves including the setup. The efficiency is still about 5% after the optimization, which is lower than a previously tuned mixed precision solver for the K computer, 22%. The throughput is, however, more than two times better for a physical point configuration.

High Energy Physics - Lattice Computational Physics

Enhanced Preconditioner for JOREK MHD Solver

343 - I Holod , M Hoelzl , P S Verma 2021

The JOREK extended magneto-hydrodynamic (MHD) code is a widely used simulation code for studying the non-linear dynamics of large-scale instabilities in divertor tokamak plasmas. Due to the large scale-separation intrinsic to these phenomena both in space and time, the computational costs for simulations in realistic geometry and with realistic parameters can be very high, motivating the investment of considerable effort for optimization. In this article, a set of developments regarding the JOREK solver and preconditioner is described, which lead to overall significant benefits for large production simulations. This comprises in particular enhanced convergence in highly non-linear scenarios and a general reduction of memory consumption and computational costs. The developments include faster construction of preconditioner matrices, a domain decomposition of preconditioning matrices for solver libraries that can handle distributed matrices, interfaces for additional solver libraries, an option to use matrix compression methods, and the implementation of a complex solver interface for the preconditioner. The most significant development presented consists in a generalization of the physics based preconditioner to mode groups, which allows to account for the dominant interactions between toroidal Fourier modes in highly non-linear simulations. At the cost of a moderate increase of memory consumption, the technique can strongly enhance convergence in suitable cases allowing to use significantly larger time steps. For all developments, benchmarks based on typical simulation cases demonstrate the resulting improvements.

Computational Physics Plasma Physics

Modified Block BiCGSTAB for Lattice QCD

858 - Y. Nakamura , K. -I. Ishikawa , Y. Kuramashi 2011

We present results for application of block BiCGSTAB algorithm modified by the QR decomposition and the SAP preconditioner to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD on a $32^3 times 64$ lattice at almost physical quark masses. The QR decomposition improves convergence behaviors in the block BiCGSTAB algorithm suppressing deviation between true residual and recursive one. The SAP preconditioner applied to the domain-decomposed lattice helps us minimize communication overhead. We find remarkable cost reduction thanks to cache tuning and reduction of number of iterations.

High Energy Physics - Lattice

Nucleon matrix elements from lattice QCD with all-mode-averaging and a domain-decomposed solver: an exploratory study

229 - Georg von Hippel , Thomas D. Rae , Eigo Shintani 2016

We study the performance of all-mode-averaging (AMA) when used in conjunction with a locally deflated SAP-preconditioned solver, determining how to optimize the local block sizes and number of deflation fields in order to minimize the computational cost for a given level of overall statistical accuracy. We find that AMA enables a reduction of the statistical error on nucleon charges by a factor of around two at the same cost when compared to the standard method. As a demonstration, we compute the axial, scalar and tensor charges of the nucleon in $N_f=2$ lattice QCD with non-perturbatively O(a)-improved Wilson quarks, using O(10,000) measurements to pursue the signal out to source-sink separations of $t_ssim 1.5$ fm. Our results suggest that the axial charge is suffering from a significant amount (5-10%) of excited-state contamination at source-sink separations of up to $t_ssim 1.2$ fm, whereas the excited-state contamination in the scalar and tensor charges seems to be small.

High Energy Physics - Lattice Computational Physics

Charged multi-hadron systems in lattice QCD+QED

90 - S. R. Beane , W. Detmold , R. Horsley 2020

Systems with the quantum numbers of up to twelve charged and neutral pseudoscalar mesons, as well as one-, two-, and three-nucleon systems, are studied using dynamical lattice quantum chromodynamics and quantum electrodynamics (QCD+QED) calculations and effective field theory. QED effects on hadronic interactions are determined by comparing systems of charged and neutral hadrons after tuning the quark masses to remove strong isospin breaking effects. A non-relativistic effective field theory, which perturbatively includes finite-volume Coulomb effects, is analyzed for systems of multiple charged hadrons and found to accurately reproduce the lattice QCD+QED results. QED effects on charged multi-hadron systems beyond Coulomb photon exchange are determined by comparing the two- and three-body interaction parameters extracted from the lattice QCD+QED results for charged and neutral multi-hadron systems.

High Energy Physics - Lattice High Energy Physics - Phenomenology Nuclear Theory

comments

Fetching comments

Higher Institute of Business Administration

Additional details More universities

Multi-block/multi-core SSOR preconditioner for the QCD quark solver for K computer

Ask ChatGPT about the research

No Arabic abstract

Read More