بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

MILC staggered conjugate gradient performance on Intel KNL

76 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ruizi Li

تاريخ النشر 2016

مجال البحث فيزياء

والبحث باللغة English

تأليف Carleton DeTar - Douglas Doerfler - Steven Gottlieb

فيزياء الطاقة العالية - شعرية الفيزياء الحسابية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We review our work done to optimize the staggered conjugate gradient (CG) algorithm in the MILC code for use with the Intel Knights Landing (KNL) architecture. KNL is the second gener- ation Intel Xeon Phi processor. It is capable of massive thread parallelism, data parallelism, and high on-board memory bandwidth and is being adopted in supercomputing centers for scientific research. The CG solver consumes the majority of time in production running, so we have spent most of our effort on it. We compare performance of an MPI+OpenMP baseline version of the MILC code with a version incorporating the QPhiX staggered CG solver, for both one-node and multi-node runs.

قيم البحث

67 - Piotr Korcyl , Grzegorz Korcyl 2018

Results of porting parts of the Lattice Quantum Chromodynamics code to modern FPGA devices are presented. A single-node, double precision implementation of the Conjugate Gradient algorithm is used to invert numerically the Dirac-Wilson operator on a 4-dimensional grid on a Xilinx Zynq evaluation board. The code is divided into two software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in programmable logic, and the rest of the algorithm runs on the ARM cores. Optimized data blocks are used to efficiently use data movement infrastructure allowing to reach intervals of 1 clock cycle. We show that the FPGA implementation can offer a comparable performance compared to that obtained using Intel Xeon Phi KNL.

فيزياء الطاقة العالية - شعرية الفيزياء الحسابية

A performance evaluation of CCS QCD Benchmark on the COMA (Intel(R) Xeon Phi$^{TM}$, KNC) system

106 - Taisuke Boku , Ken-Ichi Ishikawa , Yoshinobu Kuramashi 2016

The most computationally demanding part of Lattice QCD simulations is solving quark propagators. Quark propagators are typically obtained with a linear equation solver utilizing HPC machines. The CCS QCD Benchmark is a benchmark program solving the W ilson-Clover quark propagator, and is developed at the Center for Computational Sciences (CCS), University of Tsukuba. We optimized the benchmark program for a Intel XeonPhi (Knights Corner, KNC) system named COMA (PACS-IX) at CCS Tsukuba under the Intel Parallel Computing Center program. A single precision BiCGStab solver with the overlapped Restricted Additive Schwarz (RAS) preconditioner was implemented using SIMD intrinsics, OpenMP and MPI in the offload mode. With the reverse-offloading technique, we could reduce the communication and offloading overheads. We observed a performance of $sim 200$ GFlops sustained for the Wilson-Clover hopping matrix multiplication on the lattice sizes larger than $24^3times 32$ on a sinlge card of the COMA system. A good weak scaling perofmace was observed on the local lattice sizes larger than $24^3times 32$.

فيزياء الطاقة العالية - شعرية الفيزياء الحسابية

Practical Implementation of Lattice QCD Simulation on SIMD Machines with Intel AVX-512

58 - Issaku Kanamori , Hideo Matsufuru 2018

We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel AVX-512 architecture. The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that repre sents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of AVX-512, such as using intrinsics and manual prefetching, for the matrix multiplication. Based on experience on the Oakforest-PACS system, a large scale cluster composed of Intel Xeon Phi Knights Landing, we discuss the performance tuning exploiting AVX-512 and code design on the SIMD architecture and massively parallel machines. We observe that the same code runs efficiently on an Intel Xeon Skylake-SP machine.

فيزياء الطاقة العالية - شعرية الفيزياء الحسابية

Energy-efficiency evaluation of Intel KNL for HPC workloads

103 - E. Calore , A. Gabbana , S.F. Schifano 2018

Energy consumption is increasingly becoming a limiting factor to the design of faster large-scale parallel systems, and development of energy-efficient and energy-aware applications is today a relevant issue for HPC code-developer communities. In thi s work we focus on energy performance of the Knights Landing (KNL) Xeon Phi, the latest many-core architecture processor introduced by Intel into the HPC market. We take into account the 64-core Xeon Phi 7230, and analyze its energy performance using both the on-chip MCDRAM and the regular DDR4 system memory as main storage for the application data-domain. As a benchmark application we use a Lattice Boltzmann code heavily optimized for this architecture and implemented using different memory data layouts to store its lattice. We assessthen the energy consumption using different memory data-layouts, kind of memory (DDR4 or MCDRAM) and number of threads per core.

النظم الموزعة والتوازية والحوسبة العنقودية

Status of the MILC light pseudoscalar meson project

375 - C. Bernard , C. DeTar , Steven Gottlieb 2007

We discuss the current status of our calculation of the physics of pi and K mesons using three dynamical flavors of improved staggered quarks. This year, we have a new ensemble with a lattice spacing of 0.06 fm and a light sea mass of 0.2 m_s, as wel l as significant increases in statistics at several coarser lattice spacings and/or heavier sea masses. Results for decay constants, quark masses, low energy constants, condensates, and V_{us} are presented.

فيزياء الطاقة العالية - شعرية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حلوان

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

MILC staggered conjugate gradient performance on Intel KNL

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً