ﻻ يوجد ملخص باللغة العربية
The deployment of the next generation computing platform at ExaFlops scale requires to solve new technological challenges mainly related to the impressive number (up to 10^6) of compute elements required. This impacts on system power consumption, in terms of feasibility and costs, and on system scalability and computing efficiency. In this perspective analysis, exploration and evaluation of technologies characterized by low power, high efficiency and high degree of customization is strongly needed. Among the various European initiative targeting the design of ExaFlops system, ExaNeSt and EuroExa are EU-H2020 funded initiatives leveraging on high end MPSoC FPGAs. Last generation MPSoC FPGAs can be seen as non-mainstream but powerful HPC Exascale enabling components thanks to the integration of embedded multi-core, ARM-based low power CPUs and a huge number of hardware resources usable to co-design application oriented accelerators and to develop a low latency high bandwidth network architecture. In this paper we introduce ExaNet the FPGA-based, scalable, direct network architecture of ExaNeSt system. ExaNet allow us to explore different interconnection topologies, to evaluate advanced routing functions for congestion control and fault tolerance and to design specific hardware components for acceleration of collective operations. After a brief introduction of the motivations and goals of ExaNeSt and EuroExa projects, we will report on the status of network architecture design and its hardware/software testbed adding preliminary bandwidth and latency achievements.
A ubiquitous computing environment consists of many resources that need to be identified by users and applications. Users and developers require some way to identify resources by human readable names. In addition, ubiquitous computing environments im
The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this operation can be accelerated by offloading it to network switches, that aggregat
Power efficiency is critical in high performance computing (HPC) systems. To achieve high power efficiency on application level, it is vital importance to efficiently distribute power used by application checkpoints. In this study, we analyze the rel
This paper argues for an accelerator development toolchain that takes into account the whole system containing the accelerator. With whole-system visibility, the toolchain can better assist accelerator scoping and composition in the context of the ex
The standard nature of computing is currently being challenged by a range of problems that start to hinder technological progress. One of the strategies being proposed to address some of these problems is to develop novel brain-inspired processing me