ﻻ يوجد ملخص باللغة العربية
The complexity of modern and upcoming computing architectures poses severe challenges for code developers and application specialists, and forces them to expose the highest possible degree of parallelism, in order to make the best use of the available hardware. The Intel$^{(R)}$ Xeon Phi$^{(TM)}$ of second generation (code-named Knights Landing, henceforth KNL) is the latest many-core system, which implements several interesting hardware features like for example a large number of cores per node (up to 72), the 512 bits-wide vector registers and the high-bandwidth memory. The unique features of KNL make this platform a powerful testbed for modern HPC applications. The performance of codes on KNL is therefore a useful proxy of their readiness for future architectures. In this work we describe the lessons learnt during the optimisation of the widely used codes for computational astrophysics P-Gadget-3, Flash and Echo. Moreover, we present results for the visualisation and analysis tools VisIt and yt. These examples show that modern architectures benefit from code optimisation at different levels, even more than traditional multi-core systems. However, the level of modernisation of typical community codes still needs improvements, for them to fully utilise resources of novel architectures.
The ExaNeSt and EuroExa H2020 EU-funded projects aim to design and develop an exascale ready computing platform prototype based on low-energy-consumption ARM64 cores and FPGA accelerators. We participate in the application-driven design of the hardwa
The application of high-performance computing (HPC) processes, tools, and technologies to Controlled Unclassified Information (CUI) creates both opportunities and challenges. Building on our experiences developing, deploying, and managing the Researc
In this paper, we propose the first optimum process scheduling algorithm for an increasingly prevalent type of heterogeneous multicore (HEMC) system that combines high-performance big cores and energy-efficient small cores with the same instruction-s
We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identif
Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, bioinformatics, and chemistry, are often hard to parallelize. The Combinatorial BLAS library implements key computational primitives for rapid developm