بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Honing and proofing Astrophysical codes on the road to Exascale. Experiences from code modernization on many-core systems

109 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Salvatore Cielo

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية فيزياء

والبحث باللغة English

تأليف Salvatore Cielo - Luigi Iapichino - Fabio Baruffa

النظم الموزعة والتوازية والحوسبة العنقودية الأجهزة والأساليب للزيئات الفيزياء الفلكية الأداء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The complexity of modern and upcoming computing architectures poses severe challenges for code developers and application specialists, and forces them to expose the highest possible degree of parallelism, in order to make the best use of the available hardware. The Intel$^{(R)}$ Xeon Phi$^{(TM)}$ of second generation (code-named Knights Landing, henceforth KNL) is the latest many-core system, which implements several interesting hardware features like for example a large number of cores per node (up to 72), the 512 bits-wide vector registers and the high-bandwidth memory. The unique features of KNL make this platform a powerful testbed for modern HPC applications. The performance of codes on KNL is therefore a useful proxy of their readiness for future architectures. In this work we describe the lessons learnt during the optimisation of the widely used codes for computational astrophysics P-Gadget-3, Flash and Echo. Moreover, we present results for the visualisation and analysis tools VisIt and yt. These examples show that modern architectures benefit from code optimisation at different levels, even more than traditional multi-core systems. However, the level of modernisation of typical community codes still needs improvements, for them to fully utilise resources of novel architectures.

قيم البحث

424 - David Goz , Sara Bertocco , Luca Tornatore 2018

The ExaNeSt and EuroExa H2020 EU-funded projects aim to design and develop an exascale ready computing platform prototype based on low-energy-consumption ARM64 cores and FPGA accelerators. We participate in the application-driven design of the hardwa re solutions and prototype validation. To carry on this work we are using, among others, Hy-Nbody, a state-of-the-art direct N-body code. Core algorithms of Hy-Nbody have been improved in such a way to increasingly fit them to the exascale target platform. Waiting for the ExaNest prototype release, we are performing tests and code tuning operations on an ARM64 SoC facility: a SLURM managed HPC cluster based on 64-bit ARMv8 Cortex-A72/Cortex-A53 core design and powered by a Mali-T864 embedded GPU. In parallel, we are porting a kernel of Hy-Nbody on FPGA aiming to test and compare the performance-per-watt of our algorithms on different platforms. In this paper we describe how we re-engineered the application and we show first results on ARM SoC.

الأجهزة والأساليب للزيئات الفيزياء الفلكية

Cloud to Ground Secured Computing: User Experiences on the Transition from Cloud-Based to Locally-Sited Hardware

75 - Carolyn Ellis 2021

The application of high-performance computing (HPC) processes, tools, and technologies to Controlled Unclassified Information (CUI) creates both opportunities and challenges. Building on our experiences developing, deploying, and managing the Researc h Environment for Encumbered Data (REED) hosted by AWS GovCloud, Research Computing at Purdue University has recently deployed Weber, our locally-sited HPC solution for the storage and analysis of CUI data. Weber presents our customer base with advances in data access, portability, and usability at a low, stable cost while reducing administrative overhead for our information technology support team.

النظم الموزعة والتوازية والحوسبة العنقودية

Analytical Process Scheduling Optimization for Heterogeneous Multi-core Systems

204 - Chien-Hao Chen , Ren-Song Tsay 2021

In this paper, we propose the first optimum process scheduling algorithm for an increasingly prevalent type of heterogeneous multicore (HEMC) system that combines high-performance big cores and energy-efficient small cores with the same instruction-s et architecture (ISA). Existing algorithms are all heuristics-based, and the well-known IPC-driven approach essentially tries to schedule high scaling factor processes on big cores. Our analysis shows that, for optimum solutions, it is also critical to consider placing long running processes on big cores. Tests of SPEC 2006 cases on various big-small core combinations show that our proposed optimum approach is up to 34% faster than the IPC-driven heuristic approach in terms of total workload completion time. The complexity of our algorithm is O(NlogN) where N is the number of processes. Therefore, the proposed optimum algorithm is practical for use.

النظم الموزعة والتوازية والحوسبة العنقودية أنظمة التشغيل الأداء

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

98 - Fabio Baruffa , Luigi Iapichino , Nicolay J. Hammer 2016

We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identif y and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include threading parallelism optimisation, change of the data layout into Structure of Arrays (SoA), auto-vectorisation and algorithmic improvements in the particle sorting. We obtain shorter execution time and improved threading scalability both on Intel XeonR ($2.6 times$ on Ivy Bridge) and Xeon PhiTM ($13.7 times$ on Knights Corner) systems. First few tests of the optimised code result in $19.1 times$ faster execution on second generation Xeon Phi (Knights Landing), thus demonstrating the portability of the devised optimisation solutions to upcoming architectures.

النظم الموزعة والتوازية والحوسبة العنقودية الأجهزة والأساليب للزيئات الفيزياء الفلكية الفيزياء الحسابية

Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systems

348 - Ariful Azad , Oguz Selvitopi , Md Taufique Hussain 2021

Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, bioinformatics, and chemistry, are often hard to parallelize. The Combinatorial BLAS library implements key computational primitives for rapid developm ent of combinatorial algorithms in distributed-memory systems. During the decade since its first introduction, the Combinatorial BLAS library has evolved and expanded significantly. This paper details many of the key technical features of Combinatorial BLAS version 2.0, such as communication avoidance, hierarchical parallelism via in-node multithreading, accelerator support via GPU kernels, generalized semiring support, implementations of key data structures and functions, and scalable distributed I/O operations for human-readable files. Our paper also presents several rules of thumb for choosing the right data structures and functions in Combinatorial BLAS 2.0, under various common application scenarios.

النظم الموزعة والتوازية والحوسبة العنقودية الرياضيات المتقطعة الأداء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة المأمون الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Honing and proofing Astrophysical codes on the road to Exascale. Experiences from code modernization on many-core systems

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً