Comparing OpenMP Implementations With Applications Across A64FX Platforms

107 0 0.0 ( 0 )

Download Cite

Added by Benjamin Michalowicz

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Benjamin Michalowicz - Eric Raut - Yan Kang

Mathematical Software

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The development of the A64FX processor by Fujitsu has created a massive innovation in High-Performance Computing and the birth of Fugaku: the current worlds fastest supercomputer. A variety of tools are used to analyze the run-times and performances of several applications, and in particular, how these applications scale on the A64FX processor. We examine the performance and behavior of applications through OpenMP scaling and how their performance differs across different compilers on the new Ookami cluster at Stony Brook University as well as the Fugaku supercomputer at RIKEN in Japan.

rate research

Comparing the behavior of OpenMP Implementations with various Applications on two different Fujitsu A64FX platforms

136 - Benjamin Michalowicz , Eric Raut , Yan Kang 2021

The development of the A64FX processor by Fujitsu has been a massive innovation in vectorized processors and led to Fugaku: the current worlds fastest supercomputer. We use a variety of tools to analyze the behavior and performance of several OpenMP applications with different compilers, and how these applications scale on the different A64FX processors on clusters at Stony Brook University and RIKEN.

Performance

Do Transformer Modifications Transfer Across Implementations and Applications?

110 - Sharan Narang , Hyung Won Chung , Yi Tay 2021

The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we find that most modifications do not meaningfully improve performance. Furthermore, most of the Transformer variants we found beneficial were either developed in the same codebase that we used or are relatively minor changes. We conjecture that performance improvements may strongly depend on implementation details and correspondingly make some recommendations for improving the generality of experimental results.

Machine Learning Computation and Language

Hydra: a C++11 framework for data analysis in massively parallel platforms

192 - A. A. Alves Jr , M. D. Sokoloff 2017

Hydra is a header-only, templated and C++11-compliant framework designed to perform the typical bottleneck calculations found in common HEP data analyses on massively parallel platforms. The framework is implemented on top of the C++11 Standard Library and a variadic version of the Thrust library and is designed to run on Linux systems, using OpenMP, CUDA and TBB enabled devices. This contribution summarizes the main features of Hydra. A basic description of the overall design, functionality and user interface is provided, along with some code examples and measurements of performance.

Mathematical Software High Energy Physics - Experiment Computational Physics

VegasFlow: accelerating Monte Carlo simulation across platforms

124 - Juan M. Cruz-Martinez , Stefano Carrazza 2020

In this work we demonstrate the usage of the VegasFlow library on multidevice situations: multi-GPU in one single node and multi-node in a cluster. VegasFlow is a new software for fast evaluation of highly parallelizable integrals based on Monte Carlo integration. It is inspired by the Vegas algorithm, very often used as the driver of cross section integrations and based on Googles powerful TensorFlow library. In this proceedings we consider a typical multi-GPU configuration to benchmark how different batch sizes can increase (or decrease) the performance on a Leading Order example integration.

Computational Physics High Energy Physics - Phenomenology

ZKCM: a C++ library for multiprecision matrix computation with applications in quantum information

568 - Akira SaiToh 2013

ZKCM is a C++ library developed for the purpose of multiprecision matrix computation, on the basis of the GNU MP and MPFR libraries. It provides an easy-to-use syntax and convenient functions for matrix manipulations including those often used in numerical simulations in quantum physics. Its extension library, ZKCM_QC, is developed for simulating quantum computing using the time-dependent matrix-product-state simulation method. This paper gives an introduction about the libraries with practical sample programs.

Mathematical Software Computational Physics Quantum Physics