ترغب بنشر مسار تعليمي؟ اضغط هنا

Simulating 3-D Stellar Hydrodynamics using PPM and PPB Multifluid Gas Dynamics on CPU and CPU+GPU Nodes

101   0   0.0 ( 0 )
 نشر من قبل Robert Andrassy
 تاريخ النشر 2018
  مجال البحث فيزياء
والبحث باللغة English
 تأليف Paul R. Woodward




اسأل ChatGPT حول البحث

The special computational challenges of simulating 3-D hydrodynamics in deep stellar interiors are discussed, and numerical algorithmic responses described. Results of recent simulations carried out at scale on the NSFs Blue Waters machine at the University of Illinois are presented, with a special focus on the computational challenges they address. Prospects for future work using GPU-accelerated nodes such as those on the DoEs new Summit machine at Oak Ridge National Laboratory are described, with a focus on numerical algorithmic accommodations that we believe will be necessary.

قيم البحث

اقرأ أيضاً

We present a general method for accelerating by more than an order of magnitude the convolution of pixelated functions on the sphere with a radially-symmetric kernel. Our method splits the kernel into a compact real-space component and a compact sphe rical harmonic space component. These components can then be convolved in parallel using an inexpensive commodity GPU and a CPU. We provide models for the computational cost of both real-space and Fourier space convolutions and an estimate for the approximation error. Using these models we can determine the optimum split that minimizes the wall clock time for the convolution while satisfying the desired error bounds. We apply this technique to the problem of simulating a cosmic microwave background (CMB) anisotropy sky map at the resolution typical of the high resolution maps produced by the Planck mission. For the main Planck CMB science channels we achieve a speedup of over a factor of ten, assuming an acceptable fractional rms error of order 1.e-5 in the power spectrum of the output map.
Moores Law and Dennard Scaling have guided the semiconductor industry for the past few decades. Recently, both laws have faced validity challenges as transistor sizes approach the practical limits of physics. We are interested in testing the validity of these laws and reflect on the reasons responsible. In this work, we collect data of more than 4000 publicly-available CPU and GPU products. We find that transistor scaling remains critical in keeping the laws valid. However, architectural solutions have become increasingly important and will play a larger role in the future. We observe that GPUs consistently deliver higher performance than CPUs. GPU performance continues to rise because of increases in GPU frequency, improvements in the thermal design power (TDP), and growth in die size. But we also see the ratio of GPU to CPU performance moving closer to parity, thanks to new SIMD extensions on CPUs and increased CPU core counts.
94 - R. Aaij , M. Adinolfi , S. Aiola 2021
The LHCb experiment at CERN is undergoing an upgrade in preparation for the Run 3 data taking period of the LHC. As part of this upgrade the trigger is moving to a fully software implementation operating at the LHC bunch crossing rate. We present an evaluation of a CPU-based and a GPU-based implementation of the first stage of the High Level Trigger. After a detailed comparison both options are found to be viable. This document summarizes the performance and implementation details of these options, the outcome of which has led to the choice of the GPU-based implementation as the baseline.
Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, a re less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version.
There is growing interest in graph pattern mining (GPM) problems such as motif counting. GPM systems have been developed to provide unified interfaces for programming algorithms for these problems and for running them on parallel systems. However, ex isting systems may take hours to mine even simple patterns in moderate-sized graphs, which significantly limits their real-world usability. We present Pangolin, a high-performance and flexible in-memory GPM framework targeting shared-memory CPUs and GPUs. Pangolin is the first GPM system that provides high-level abstractions for GPU processing. It provides a simple programming interface based on the extend-reduce-filter model, which enables users to specify application-specific knowledge for search space pruning and isomorphism test elimination. We describe novel optimizations that exploit locality, reduce memory consumption, and mitigate the overheads of dynamic memory allocation and synchronization. Evaluation on a 28-core CPU demonstrates that Pangolin outperforms existing GPM frameworks Arabesque, RStream, and Fractal by 49x, 88x, and 80x on average, respectively. Acceleration on a V100 GPU further improves performance of Pangolin by 15x on average. Compared to state-of-the-art hand-optimized GPM applications, Pangolin provides competitive performance with less programming effort.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا