ترغب بنشر مسار تعليمي؟ اضغط هنا

Porting DDalphaAMG solver to K computer

55   0   0.0 ( 0 )
 نشر من قبل Issaku Kanamori
 تاريخ النشر 2018
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

We port Domain-Decomposed-alpha-AMG solver to the K computer. The system has 8 cores and 16 GB memory per node, of which theoretical peak is 128 GFlops (82,944 nodes in total). Its feature, as many as 256 registers per core and as large as 0.5 byte/Flop ratio, requires a different tuning from other machines. In order to use more registers, we change some of the data structure and rewrite matrix-vector operations with intrinsics. The performance is improved by more than a factor two for twelve solves including the setup. The efficiency is still about 5% after the optimization, which is lower than a previously tuned mixed precision solver for the K computer, 22%. The throughput is, however, more than two times better for a physical point configuration.



قيم البحث

اقرأ أيضاً

We study the algorithmic optimization and performance tuning of the Lattice QCD clover-fermion solver for the K computer. We implement the Luschers SAP preconditioner with sub-blocking in which the lattice block in a node is further divided to severa l sub-blocks to extract enough parallelism for the 8-core CPU SPARC64$^{mathrm{TM}}$ VIIIfx of the K computer. To achieve a better convergence property we use the symmetric successive over-relaxation (SSOR) iteration with {it locally-lexicographical} ordering for the sub-blocks in obtaining the block inverse. The SAP preconditioner is included in the single precision BiCGStab solver of the nested BiCGStab solver. The single precision part of the computational kernel are solely written with the SIMD oriented intrinsics to achieve the best performance of the SPARC on the K computer. We benchmark the single precision BiCGStab solver on the three lattice sizes: $12^3times 24$, $24^3times 48$ and $48^3times 96$, with fixing the local lattice size in a node at $6^3times 12$. We observe an ideal weak-scaling performance from 16 nodes to 4096 nodes. The performance of a computational kernel exceeds 50% efficiency, and the single precision BiCGstab has $sim26% susutained efficiency.
We study the performance of all-mode-averaging (AMA) when used in conjunction with a locally deflated SAP-preconditioned solver, determining how to optimize the local block sizes and number of deflation fields in order to minimize the computational c ost for a given level of overall statistical accuracy. We find that AMA enables a reduction of the statistical error on nucleon charges by a factor of around two at the same cost when compared to the standard method. As a demonstration, we compute the axial, scalar and tensor charges of the nucleon in $N_f=2$ lattice QCD with non-perturbatively O(a)-improved Wilson quarks, using O(10,000) measurements to pursue the signal out to source-sink separations of $t_ssim 1.5$ fm. Our results suggest that the axial charge is suffering from a significant amount (5-10%) of excited-state contamination at source-sink separations of up to $t_ssim 1.2$ fm, whereas the excited-state contamination in the scalar and tensor charges seems to be small.
Chern-Simons topological quantum computer is a device that can be effectively described by the Chern-Simons topological quantum field theory and used for quantum computations. Quantum qudit gates of this quantum computer are represented by sequences of quantum $mathcal{R}$-matrices. Its dimension and explicit form depend on the parameters of the Chern-Simons theory -- level $k$, gauge group $SU(N)$, and representation, which is chosen to be symmetric representation $[r]$. In this paper, we examine the universality of such a quantum computer. We prove that for sufficiently large $k$ it is universal, and the minimum allowed value of $k$ depends on the remaining parameters $r$ and $N$.
We have used a simple camera phone to significantly improve an `exploration system for astrobiology and geology. This camera phone will make it much easier to develop and test computer-vision algorithms for future planetary exploration. We envision t hat the `Astrobiology Phone-cam exploration system can be fruitfully used in other problem domains as well.
470 - C. Aubin , J. Laiho , S. Li 2008
We calculate results for K to pi and K to 0 matrix elements to next-to-leading order in 2+1 flavor partially quenched chiral perturbation theory. Results are presented for both the Delta I=1/2 and 3/2 channels, for chiral operators corresponding to c urrent-current, gluonic penguin, and electroweak penguin 4-quark operators. These formulas are useful for studying the chiral behavior of currently available 2+1 flavor lattice QCD results, from which the low energy constants of the chiral effective theory can be determined. The low energy constants of these matrix elements are necessary for an understanding of the Delta I=1/2 rule, and for calculations of epsilon/epsilon using current lattice QCD simulations.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا