مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Porting WarpX to GPU-accelerated platforms

66 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Andrew Myers

تاريخ النشر 2021

مجال البحث فيزياء الهندسة المعلوماتية

والبحث باللغة English

تأليف A. Myers - A. Almgren - L. D. Amorim

الفيزياء الحسابية النظم الموزعة والتوازية والحوسبة العنقودية مسرع فيزياء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCFs Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give current performance results on a series of relevant benchmark problems.

قيم البحث

64 - Paul Heistracher , Claas Abert , Florian Bruckner 2018

We present GPU accelerated simulations to calculate the annihilation energy of magnetic skyrmions in an atomistic spin model considering dipole-dipole, exchange, uniaxial-anisotropy and Dzyaloshinskii-Moriya interactions using the simplified string m ethod. The skyrmion annihilation energy is directly related to its thermal stability and is a key measure for the applicability of magnetic skyrmions to storage and logic devices. We investigate annihilations mediated by Bloch points as well as annihilations via boundaries for various interaction energies. Both processes show similar behaviour, with boundary annihilations resulting in slightly smaller energy barriers than Bloch point annihilations.

الفيزياء الحسابية

Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model

141 - Benjamin Block , Peter Virnau , Tobias Preis 2010

A modern graphics processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two dimensional Ising model [T. Preis et al., J. Comp. Phys. 228, 44 68 (2009)] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message Parsing Interface (MPI) on the CPU level, a single Ising lattice can be updated by a cluster of GPUs in parallel. For large systems, the computation time scales nearly linearly with the number of GPUs used. As proof of concept we reproduce the critical temperature of the 2D Ising model using finite size scaling techniques.

الفيزياء الحسابية الرسم الحاسوبي الفيزياء الرياضية

NekRS, a GPU-Accelerated Spectral Element Navier-Stokes Solver

294 - Paul Fischer , Stefan Kerkemeier , Misun Min 2021

The development of NekRS, a GPU-oriented thermal-fluids simulation code based on the spectral element method (SEM) is described. For performance portability, the code is based on the open concurrent compute abstraction and leverages scalable developm ents in the SEM code Nek5000 and in libParanumal, which is a library of high-performance kernels for high-order discretizations and PDE-based miniapps. Critical performance sections of the Navier-Stokes time advancement are addressed. Performance results on several platforms are presented, including scaling to 27,648 V100s on OLCF Summit, for calculations of up to 60B gridpoints.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية

Porting HEP Parameterized Calorimeter Simulation Code to GPUs

57 - Zhihua Dong , Heather Gray , Charles Leggett 2021

The High Energy Physics (HEP) experiments, such as those at the Large Hadron Collider (LHC), traditionally consume large amounts of CPU cycles for detector simulations and data analysis, but rarely use compute accelerators such as GPUs. As the LHC is upgraded to allow for higher luminosity, resulting in much higher data rates, purely relying on CPUs may not provide enough computing power to support the simulation and data analysis needs. As a proof of concept, we investigate the feasibility of porting a HEP parameterized calorimeter simulation code to GPUs. We have chosen to use FastCaloSim, the ATLAS fast parametrized calorimeter simulation. While FastCaloSim is sufficiently fast such that it does not impose a bottleneck in detector simulations overall, significant speed-ups in the processing of large samples can be achieved from GPU parallelization at both the particle (intra-event) and event levels; this is especially beneficial in conditions expected at the high-luminosity LHC, where extremely high per-event particle multiplicities will result from the many simultaneous proton-proton collisions. We report our experience with porting FastCaloSim to NVIDIA GPUs using CUDA. A preliminary Kokkos implementation of FastCaloSim for portability to other parallel architectures is also described.

فيزياء الطاقة العالية - التجربة النظم الموزعة والتوازية والحوسبة العنقودية

An initial investigation of the performance of GPU-based swept time-space decomposition

278 - Daniel Magee , Kyle E Niemeyer 2016

Simulations of physical phenomena are essential to the expedient design of precision components in aerospace and other high-tech industries. These phenomena are often described by mathematical models involving partial differential equations (PDEs) wi thout exact solutions. Modern design problems require simulations with a level of resolution that is difficult to achieve in a reasonable amount of time even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory accesses. Parallelized PDE solvers are subject to a trade-off in memory management: store the solution for each timestep in abundant, global memory with high access costs or in a limited, private memory with low access costs that must be passed between nodes. The GPU implementation of swept time-space decomposition presented here mitigates this dilemma by using private (shared) memory, avoiding internode communication, and overwriting unnecessary values. It shows significant improvement in the execution time of the PDE solvers in one dimension achieving speedups of 6-2x for large and small problem sizes respectively compared to naive G

الفيزياء الحسابية النظم الموزعة والتوازية والحوسبة العنقودية البرمجيات الرياضية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشام الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Porting WarpX to GPU-accelerated platforms

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً