ﻻ يوجد ملخص باللغة العربية
Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approach helps run large scale stencil codes that process data with sizes larger than the limited capacity of GPU memory. However, the performance of the GPU-based out-of-core stencil computation is always limited by the data transfer between the CPU and GPU. Many optimizations have been explored to reduce such data transfer, but the study on the use of on-the-fly compression techniques is far from sufficient. In this study, we propose a method that accelerates the GPU-based out-of-core stencil computation with on-the-fly compression. We introduce a novel data compression approach that solves the data dependency between two contiguous decomposed data blocks. We also modify a widely used GPU-based compression library to support pipelining that overlaps CPU/GPU data transfer with GPU computation. Experimental results show that the proposed method achieved a speedup of 1.2x compared the method without compression. Moreover, although the precision loss involved by compression increased with the number of time steps, the precision loss was trivial up to 4,320 time steps, demonstrating the usefulness of the proposed method.
The growth of data to be processed in the Oil & Gas industry matches the requirements imposed by evolving algorithms based on stencil computations, such as Full Waveform Inversion and Reverse Time Migration. Graphical processing units (GPUs) are an a
Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data locality res
GAMER is a GPU-accelerated Adaptive-MEsh-Refinement code for astrophysical simulations. In this work, two further extensions of the code are reported. First, we have implemented the MUSCL-Hancock method with the Roes Riemann solver for the hydrodynam
Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g.
We propose a distance between continuous-time Markov chains (CTMCs) and study the problem of computing it by comparing three different algorithmic methodologies: iterative, linear program, and on-the-fly. In a work presented at FoSSaCS12, Chen et al.