Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Simulating 3-D Stellar Hydrodynamics using PPM and PPB Multifluid Gas Dynamics on CPU and CPU+GPU Nodes

101 0 0.0 ( 0 )

Download Cite

Added by Robert Andrassy

Publication date 2018

fields Physics

and research's language is English

Authors Paul R. Woodward

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The special computational challenges of simulating 3-D hydrodynamics in deep stellar interiors are discussed, and numerical algorithmic responses described. Results of recent simulations carried out at scale on the NSFs Blue Waters machine at the University of Illinois are presented, with a special focus on the computational challenges they address. Prospects for future work using GPU-accelerated nodes such as those on the DoEs new Summit machine at Oak Ridge National Laboratory are described, with a focus on numerical algorithmic accommodations that we believe will be necessary.

rate research

Using hybrid GPU/CPU kernel splitting to accelerate spherical convolutions

391 - P. M. Sutter , Benjamin D. Wandelt , 2014

We present a general method for accelerating by more than an order of magnitude the convolution of pixelated functions on the sphere with a radially-symmetric kernel. Our method splits the kernel into a compact real-space component and a compact spherical harmonic space component. These components can then be convolved in parallel using an inexpensive commodity GPU and a CPU. We provide models for the computational cost of both real-space and Fourier space convolutions and an estimate for the approximation error. Using these models we can determine the optimum split that minimizes the wall clock time for the convolution while satisfying the desired error bounds. We apply this technique to the problem of simulating a cosmic microwave background (CMB) anisotropy sky map at the resolution typical of the high resolution maps produced by the Planck mission. For the main Planck CMB science channels we achieve a speedup of over a factor of ten, assuming an acceptable fractional rms error of order 1.e-5 in the power spectrum of the output map.

Cosmology and Nongalactic Astrophysics Instrumentation and Methods for Astrophysics

Summarizing CPU and GPU Design Trends with Product Data

188 - Yifan Sun , Nicolas Bohm Agostini , Shi Dong 2019

Moores Law and Dennard Scaling have guided the semiconductor industry for the past few decades. Recently, both laws have faced validity challenges as transistor sizes approach the practical limits of physics. We are interested in testing the validity of these laws and reflect on the reasons responsible. In this work, we collect data of more than 4000 publicly-available CPU and GPU products. We find that transistor scaling remains critical in keeping the laws valid. However, architectural solutions have become increasingly important and will play a larger role in the future. We observe that GPUs consistently deliver higher performance than CPUs. GPU performance continues to rise because of increases in GPU frequency, improvements in the thermal design power (TDP), and growth in die size. But we also see the ratio of GPU to CPU performance moving closer to parity, thanks to new SIMD extensions on CPUs and increased CPU core counts.

Distributed Parallel and Cluster Computing

A Comparison of CPU and GPU implementations for the LHCb Experiment Run 3 Trigger

94 - R. Aaij , M. Adinolfi , S. Aiola 2021

The LHCb experiment at CERN is undergoing an upgrade in preparation for the Run 3 data taking period of the LHC. As part of this upgrade the trigger is moving to a fully software implementation operating at the LHC bunch crossing rate. We present an evaluation of a CPU-based and a GPU-based implementation of the first stage of the High Level Trigger. After a detailed comparison both options are found to be viable. This document summarizes the performance and implementation details of these options, the outcome of which has led to the choice of the GPU-based implementation as the baseline.

Instrumentation and Detectors

Importance of Explicit Vectorization for CPU and GPU Software Performance

397 - Neil G. Dickson , Kamran Karimi , Firas Hamze 2010

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version.

Distributed Parallel and Cluster Computing Performance Computational Physics

Pangolin: An Efficient and Flexible Graph Pattern Mining System on CPU and GPU

105 - Xuhao Chen , Roshan Dathathri , Gurbinder Gill 2019

There is growing interest in graph pattern mining (GPM) problems such as motif counting. GPM systems have been developed to provide unified interfaces for programming algorithms for these problems and for running them on parallel systems. However, existing systems may take hours to mine even simple patterns in moderate-sized graphs, which significantly limits their real-world usability. We present Pangolin, a high-performance and flexible in-memory GPM framework targeting shared-memory CPUs and GPUs. Pangolin is the first GPM system that provides high-level abstractions for GPU processing. It provides a simple programming interface based on the extend-reduce-filter model, which enables users to specify application-specific knowledge for search space pruning and isomorphism test elimination. We describe novel optimizations that exploit locality, reduce memory consumption, and mitigate the overheads of dynamic memory allocation and synchronization. Evaluation on a 28-core CPU demonstrates that Pangolin outperforms existing GPM frameworks Arabesque, RStream, and Fractal by 49x, 88x, and 80x on average, respectively. Acceleration on a V100 GPU further improves performance of Pangolin by 15x on average. Compared to state-of-the-art hand-optimized GPM applications, Pangolin provides competitive performance with less programming effort.

Distributed Parallel and Cluster Computing

comments

Fetching comments

Wadi International University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Simulating 3-D Stellar Hydrodynamics using PPM and PPB Multifluid Gas Dynamics on CPU and CPU+GPU Nodes

Ask ChatGPT about the research

No Arabic abstract

Read More