New community

Subscribe to the gold package and get unlimited access to Shamra Academy

On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective

73 0 0.0 ( 0 )

Download Cite

Added by Axel Huebl

Publication date 2017

fields Informatics Engineering Physics

and research's language is English

Authors Axel Huebl - Rene Widera - Felix Schmitt

Performance Computational Physics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on todays and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threaded data-transformations for the I/O library ADIOS as a feasible way to trade underutilized host-side compute potential on heterogeneous systems for reduced I/O latency.

rate research

Performance of Devito on HPC-Optimised ARM Processors

116 - Hermes Senger , Jaime F. de Souza , Edson S. Gomi 2019

We evaluate the performance of Devito, a domain specific language (DSL) for finite differences on Arm ThunderX2 processors. Experiments with two common seismic computational kernels demonstrate that Arm processors can deliver competitive performance compared to other Intel Xeon processors.

Performance

Mr. Plotter: Unifying Data Reduction Techniques in Storage and Visualization Systems

104 - Sam Kumar , Michael P Andersen , David E. Culler 2021

As the rate of data collection continues to grow rapidly, developing visualization tools that scale to immense data sets is a serious and ever-increasing challenge. Existing approaches generally seek to decouple storage and visualization systems, performing just-in-time data reduction to transparently avoid overloading the visualizer. We present a new architecture in which the visualizer and data store are tightly coupled. Unlike systems that read raw data from storage, the performance of our system scales linearly with the size of the final visualization, essentially independent of the size of the data. Thus, it scales to massive data sets while supporting interactive performance (sub-100 ms query latency). This enables a new class of visualization clients that automatically manage data, quickly and transparently requesting data from the underlying database without requiring the user to explicitly initiate queries. It lays a groundwork for supporting truly interactive exploration of big data and opens new directions for research on scalable information visualization systems.

Databases Human-Computer Interaction

Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors

120 - Christie L. Alappat , Johannes Hofmann , Georg Hager 2020

Hardware platforms in high performance computing are constantly getting more complex to handle even when considering multicore CPUs alone. Numerous features and configuration options in the hardware and the software environment that are relevant for performance are not even known to most application users or developers. Microbenchmarks, i.e., simple codes that fathom a particular aspect of the hardware, can help to shed light on such issues, but only if they are well understood and if the results can be reconciled with known facts or performance models. The insight gained from microbenchmarks may then be applied to real applications for performance analysis or optimization. In this paper we investigate two modern Intel x86 server CPU architectures in depth: Broadwell EP and Cascade Lake SP. We highlight relevant hardware configuration settings that can have a decisive impact on code performance and show how to properly measure on-chip and off-chip data transfer bandwidths. The new victim L3 cache of Cascade Lake and its advanced replacement policy receive due attention. Finally we use DGEMM, sparse matrix-vector multiplication, and the HPCG benchmark to make a connection to relevant application scenarios.

Performance Distributed Parallel and Cluster Computing

Lattice QCD on upcoming Arm architectures

197 - Nils Meyer , Dirk Pleiter , Stefan Solbrig 2019

Recently Arm introduced a new instruction set called Scalable Vector Extension (SVE), which supports vector lengths up to 2048 bits. While SVE hardware will not be generally available until about 2021, we believe that future SVE-based architectures will have great potential for Lattice QCD. In this contribution we discuss key aspects of SVE and describe how we implemented SVE in the Grid Lattice QCD framework.

High Energy Physics - Lattice Computational Physics

Scalability of High-Performance PDE Solvers

116 - Paul Fischer , Misun Min , Thilina Rathnayake 2020

Performance tests and analyses are critical to effective HPC software development and are central components in the design and implementation of computational algorithms for achieving faster simulations on existing and future computing architectures for large-scale application problems. In this paper, we explore performance and space-time trade-offs for important compute-intensive kernels of large-scale numerical solvers for PDEs that govern a wide range of physical applications. We consider a sequence of PDE- motivated bake-off problems designed to establish best practices for efficient high-order simulations across a variety of codes and platforms. We measure peak performance (degrees of freedom per second) on a fixed number of nodes and identify effective code optimization strategies for each architecture. In addition to peak performance, we identify the minimum time to solution at 80% parallel efficiency. The performance analysis is based on spectral and p-type finite elements but is equally applicable to a broad spectrum of numerical PDE discretizations, including finite difference, finite volume, and h-type finite elements.

Performance Distributed Parallel and Cluster Computing

comments

Fetching comments

Kalamoon Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective

Ask ChatGPT about the research

No Arabic abstract

Read More