بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

AVX-512 extension to OpenQCD 1.6

57 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jarno Rantaharju

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ed Bennett - Mark Dawson - Michele Mesiti

فيزياء الطاقة العالية - شعرية النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We publish an extension of openQCD-1.6 with AVX-512 vector instructions using Intel intrinsics. Recent Intel processors support extended instruction sets with operations on 512-bit wide vectors, increasing both the capacity for floating point operations and register memory. Optimal use of the new capabilities requires reorganising data and floating point operations into these wider vector units. We report on the implementation and performance of the AVX-512 OpenQCD extension on clusters using Intel Knights Landing and Xeon Scalable (Skylake) CPUs. In complete HMC trajectories with physically relevant parameters we observe a performance increase of 5% to 10%.

قيم البحث

58 - Issaku Kanamori , Hideo Matsufuru 2018

We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel AVX-512 architecture. The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that repre sents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of AVX-512, such as using intrinsics and manual prefetching, for the matrix multiplication. Based on experience on the Oakforest-PACS system, a large scale cluster composed of Intel Xeon Phi Knights Landing, we discuss the performance tuning exploiting AVX-512 and code design on the SIMD architecture and massively parallel machines. We observe that the same code runs efficiently on an Intel Xeon Skylake-SP machine.

فيزياء الطاقة العالية - شعرية الفيزياء الحسابية

SIMD Vectorization for the Lennard-Jones Potential with AVX2 and AVX-512 instructions

65 - Hiroshi Watanabe , Koh M. Nakagawa 2018

This work describes the SIMD vectorization of the force calculation of the Lennard-Jones potential with Intel AVX2 and AVX-512 instruction sets. Since the force-calculation kernel of the molecular dynamics method involves indirect access to memory, t he data layout is one of the most important factors in vectorization. We find that the Array of Structures (AoS) with padding exhibits better performance than Structure of Arrays (SoA) with appropriate vectorization and optimizations. In particular, AoS with 512-bit width exhibits the best performance among the architectures. While the difference in performance between AoS and SoA is significant for the vectorization with AVX2, that with AVX-512 is minor. The effect of other optimization techniques, such as software pipelining together with vectorization, is also discussed. We present results for benchmarks on three CPU architectures: Intel Haswell (HSW), Knights Landing (KNL), and Skylake (SKL). The performance gains by vectorization are about 42% on HSW compared with the code optimized without vectorization. On KNL, the hand-vectorized codes exhibit 34% better performance than the codes vectorized automatically by the Intel compiler. On SKL, the code vectorized with AVX2 exhibits slightly better performance than that with vectorized AVX-512.

البرمجيات الرياضية الهندسة الحاسوبية، المالية،العلوم

An extension to the Luschers finite volume method above inelastic threashold (formalism)

506 - Noriyoshi Ishii for HAL-QCD Collaboration 2011

An extension of the Luschers finite volume method above inelastic thresholds is proposed. It is fulfilled by extendind the procedure recently proposed by HAL-QCD Collaboration for a single channel system. Focusing on the asymptotic behaviors of the N ambu-Bethe-Salpeter (NBS) wave functions (equal-time) near spatial infinity, a coupled channel extension of effective Schrodinger equation is constructed by introducing an energy-independent interaction kernel. Because the NBS wave functions contain the information of T-matrix at long distance, S-matrix can be obtained by solving the coupled channel effective Schrodinger equation in the infinite volume.

فيزياء الطاقة العالية - شعرية نظرية نووية

Extension of a new method for locating critical temperatures

144 - P. Sawicki 1997

We investigate recently proposed method for locating critical temperatures and introduce some modifications which allow to formulate exact criterion for any self-dual model. We apply the modified method for the Ashkin-Teller model and show that the e xact result for a critical temperature is reproduced. We test also a two-layer Ising model for the presence of eventual self-duality.

فيزياء الطاقة العالية - شعرية

Accelerating QDP++ using GPUs

364 - Frank Winter 2011

Graphic Processing Units (GPUs) are getting increasingly important as target architectures in scientific High Performance Computing (HPC). NVIDIA established CUDA as a parallel computing architecture controlling and making use of the compute power of GPUs. CUDA provides sufficient support for C++ language elements to enable the Expression Template (ET) technique in the device memory domain. QDP++ is a C++ vector class library suited for quantum field theory which provides vector data types and expressions and forms the basis of the lattice QCD software suite Chroma. In this work accelerating QDP++ expression evaluation to a GPU was successfully implemented leveraging the ET technique and using Just-In-Time (JIT) compilation. The Portable Expression Template Engine (PETE) and the C API for CUDA kernel arguments were used to build the bridge between host and device memory domains. This provides the possibility to accelerate Chroma routines to a GPU which are typically not subject to special optimisation. As an application example a smearing routine was accelerated to execute on a GPU. A significant speed-up compared to normal CPU execution could be measured.

فيزياء الطاقة العالية - شعرية النظم الموزعة والتوازية والحوسبة العنقودية لغات البرمجة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الأميركية في بيروت

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

AVX-512 extension to OpenQCD 1.6

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً