ﻻ يوجد ملخص باللغة العربية
This paper presents a methodology for using LLVM-based tools to tune the DCA++ (dynamical clusterapproximation) application that targets the new ARM A64FX processor. The goal is to describethe changes required for the new architecture and generate efficient single instruction/multiple data(SIMD) instructions that target the new Scalable Vector Extension instruction set. During manualtuning, the authors used the LLVM tools to improve code parallelization by using OpenMP SIMD,refactored the code and applied transformation that enabled SIMD optimizations, and ensured thatthe correct libraries were used to achieve optimal performance. By applying these code changes, codespeed was increased by 1.98X and 78 GFlops were achieved on the A64FX processor. The authorsaim to automatize parts of the efforts in the OpenMP Advisor tool, which is built on top of existingand newly introduced LLVM tooling.
The ROOT I/O (RIO) subsystem is foundational to most HEP experiments - it provides a file format, a set of APIs/semantics, and a reference implementation in C++. It is often found at the base of an experiments framework and is used to serialize the e
We introduce Tuna, a static analysis approach to optimizing deep neural network programs. The optimization of tensor operations such as convolutions and matrix multiplications is the key to improving the performance of deep neural networks. Many deep
Coflow scheduling improves data-intensive application performance by improving their networking performance. State-of-the-art online coflow schedulers in essence approximate the classic Shortest-Job-First (SJF) scheduling by learning the coflow size
Backtracking (i.e., reverse execution) helps the user of a debugger to naturally think backwards along the execution path of a program, and thinking backwards makes it easy to locate the origin of a bug. So far backtracking has been implemented mostl
Software container solutions have revolutionized application development approaches by enabling lightweight platform abstractions within the so-called containers. Several solutions are being actively developed in attempts to bring the benefits of con