بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Simulating 3-D Stellar Hydrodynamics using PPM and PPB Multifluid Gas Dynamics on CPU and CPU+GPU Nodes

101 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Robert Andrassy

تاريخ النشر 2018

مجال البحث فيزياء

والبحث باللغة English

تأليف Paul R. Woodward

الفيزياء الفلكية الشمسية والنجوم الفيزياء الحسابية ديناميات السوائل

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The special computational challenges of simulating 3-D hydrodynamics in deep stellar interiors are discussed, and numerical algorithmic responses described. Results of recent simulations carried out at scale on the NSFs Blue Waters machine at the University of Illinois are presented, with a special focus on the computational challenges they address. Prospects for future work using GPU-accelerated nodes such as those on the DoEs new Summit machine at Oak Ridge National Laboratory are described, with a focus on numerical algorithmic accommodations that we believe will be necessary.

قيم البحث

199 - P. M. Sutter , Benjamin D. Wandelt , 2014

We present a general method for accelerating by more than an order of magnitude the convolution of pixelated functions on the sphere with a radially-symmetric kernel. Our method splits the kernel into a compact real-space component and a compact sphe rical harmonic space component. These components can then be convolved in parallel using an inexpensive commodity GPU and a CPU. We provide models for the computational cost of both real-space and Fourier space convolutions and an estimate for the approximation error. Using these models we can determine the optimum split that minimizes the wall clock time for the convolution while satisfying the desired error bounds. We apply this technique to the problem of simulating a cosmic microwave background (CMB) anisotropy sky map at the resolution typical of the high resolution maps produced by the Planck mission. For the main Planck CMB science channels we achieve a speedup of over a factor of ten, assuming an acceptable fractional rms error of order 1.e-5 in the power spectrum of the output map.

علم الكونيات والفيزياء الفلكية Nongalactic الأجهزة والأساليب للزيئات الفيزياء الفلكية

Summarizing CPU and GPU Design Trends with Product Data

188 - Yifan Sun , Nicolas Bohm Agostini , Shi Dong 2019

Moores Law and Dennard Scaling have guided the semiconductor industry for the past few decades. Recently, both laws have faced validity challenges as transistor sizes approach the practical limits of physics. We are interested in testing the validity of these laws and reflect on the reasons responsible. In this work, we collect data of more than 4000 publicly-available CPU and GPU products. We find that transistor scaling remains critical in keeping the laws valid. However, architectural solutions have become increasingly important and will play a larger role in the future. We observe that GPUs consistently deliver higher performance than CPUs. GPU performance continues to rise because of increases in GPU frequency, improvements in the thermal design power (TDP), and growth in die size. But we also see the ratio of GPU to CPU performance moving closer to parity, thanks to new SIMD extensions on CPUs and increased CPU core counts.

النظم الموزعة والتوازية والحوسبة العنقودية

A Comparison of CPU and GPU implementations for the LHCb Experiment Run 3 Trigger

94 - R. Aaij , M. Adinolfi , S. Aiola 2021

The LHCb experiment at CERN is undergoing an upgrade in preparation for the Run 3 data taking period of the LHC. As part of this upgrade the trigger is moving to a fully software implementation operating at the LHC bunch crossing rate. We present an evaluation of a CPU-based and a GPU-based implementation of the first stage of the High Level Trigger. After a detailed comparison both options are found to be viable. This document summarizes the performance and implementation details of these options, the outcome of which has led to the choice of the GPU-based implementation as the baseline.

أجهزة الكشف الفيزيائية

Importance of Explicit Vectorization for CPU and GPU Software Performance

222 - Neil G. Dickson , Kamran Karimi , Firas Hamze 2010

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, a re less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version.

النظم الموزعة والتوازية والحوسبة العنقودية الأداء الفيزياء الحسابية

Pangolin: An Efficient and Flexible Graph Pattern Mining System on CPU and GPU

105 - Xuhao Chen , Roshan Dathathri , Gurbinder Gill 2019

There is growing interest in graph pattern mining (GPM) problems such as motif counting. GPM systems have been developed to provide unified interfaces for programming algorithms for these problems and for running them on parallel systems. However, ex isting systems may take hours to mine even simple patterns in moderate-sized graphs, which significantly limits their real-world usability. We present Pangolin, a high-performance and flexible in-memory GPM framework targeting shared-memory CPUs and GPUs. Pangolin is the first GPM system that provides high-level abstractions for GPU processing. It provides a simple programming interface based on the extend-reduce-filter model, which enables users to specify application-specific knowledge for search space pruning and isomorphism test elimination. We describe novel optimizations that exploit locality, reduce memory consumption, and mitigate the overheads of dynamic memory allocation and synchronization. Evaluation on a 28-core CPU demonstrates that Pangolin outperforms existing GPM frameworks Arabesque, RStream, and Fractal by 49x, 88x, and 80x on average, respectively. Acceleration on a V100 GPU further improves performance of Pangolin by 15x on average. Compared to state-of-the-art hand-optimized GPM applications, Pangolin provides competitive performance with less programming effort.

النظم الموزعة والتوازية والحوسبة العنقودية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد الوطني لإدارة الأعمال

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Simulating 3-D Stellar Hydrodynamics using PPM and PPB Multifluid Gas Dynamics on CPU and CPU+GPU Nodes

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً