ﻻ يوجد ملخص باللغة العربية
We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on todays and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threaded data-transformations for the I/O library ADIOS as a feasible way to trade underutilized host-side compute potential on heterogeneous systems for reduced I/O latency.
We evaluate the performance of Devito, a domain specific language (DSL) for finite differences on Arm ThunderX2 processors. Experiments with two common seismic computational kernels demonstrate that Arm processors can deliver competitive performance compared to other Intel Xeon processors.
As the rate of data collection continues to grow rapidly, developing visualization tools that scale to immense data sets is a serious and ever-increasing challenge. Existing approaches generally seek to decouple storage and visualization systems, per
Hardware platforms in high performance computing are constantly getting more complex to handle even when considering multicore CPUs alone. Numerous features and configuration options in the hardware and the software environment that are relevant for
Recently Arm introduced a new instruction set called Scalable Vector Extension (SVE), which supports vector lengths up to 2048 bits. While SVE hardware will not be generally available until about 2021, we believe that future SVE-based architectures w
Performance tests and analyses are critical to effective HPC software development and are central components in the design and implementation of computational algorithms for achieving faster simulations on existing and future computing architectures