Accelerating parameter inference with graphics processing units

270 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Richard O'Shaughnessy

تاريخ النشر 2019

مجال البحث فيزياء

والبحث باللغة English

تأليف D. Wysocki

الأجهزة والأساليب للزيئات الفيزياء الفلكية النسبية العامة وهدية الكونيات الكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Gravitational wave Bayesian parameter inference involves repeated comparisons of GW data to generic candidate predictions. Even with algorithmically efficient methods like RIFT or reduced-order quadrature, the time needed to perform these calculations and overall computational cost can be significant compared to the minutes to hours needed to achieve the goals of low-latency multimessenger astronomy. By translating some elements of the RIFT algorithm to operate on graphics processing units (GPU), we demonstrate substantial performance improvements, enabling dramatically reduced overall cost and latency.

قيم البحث

84 - Sijme-Jan Paardekooper 2017

We present a new method for numerical hydrodynamics which uses a multidimensional generalisation of the Roe solver and operates on an unstructured triangular mesh. The main advantage over traditional methods based on Riemann solvers, which commonly u se one-dimensional flux estimates as building blocks for a multidimensional integration, is its inherently multidimensional nature, and as a consequence its ability to recognise multidimensional stationary states that are not hydrostatic. A second novelty is the focus on Graphics Processing Units (GPUs). By tailoring the algorithms specifically to GPUs we are able to get speedups of 100-250 compared to a desktop machine. We compare the multidimensional upwind scheme to a traditional, dimensionally split implementation of the Roe solver on several test problems, and we find that the new method significantly outperforms the Roe solver in almost all cases. This comes with increased computational costs per time step, which makes the new method approximately a factor of 2 slower than a dimensionally split scheme acting on a structured grid.

الأجهزة والأساليب للزيئات الفيزياء الفلكية الفيزياء الحسابية

Performance Acceleration of Kernel Polynomial Method Applying Graphics Processing Units

178 - Shixun Zhang , Shinichi Yamagiwa , Masahiko Okumura 2011

The Kernel Polynomial Method (KPM) is one of the fast diagonalization methods used for simulations of quantum systems in research fields of condensed matter physics and chemistry. The algorithm has a difficulty to be parallelized on a cluster compute r or a supercomputer due to the fine-gain recursive calculations. This paper proposes an implementation of the KPM on the recent graphics processing units (GPU) where the recursive calculations are able to be parallelized in the massively parallel environment. This paper also illustrates performance evaluations regarding the cases when the actual simulation parameters are applied, the one for increased intensive calculations and the one for increased amount of memory usage. Finally, it concludes that the performance on GPU promises very high performance compared to the one on CPU and reduces the overall simulation time.

الفيزياء الحسابية مسألة أخرى مكثفة الأداء

Least-squares fitting of Gaussian spots on graphics processing units

128 - Marcel Leutenegger , Michael Weber 2021

The investigation of samples with a spatial resolution in the nanometer range relies on the precise and stable positioning of the sample. Due to inherent mechanical instabilities of typical sample stages in optical microscopes, it is usually required to control and/or monitor the sample position during the acquisition. The tracking of sparsely distributed fiducial markers at high speed allows stabilizing the sample position at millisecond time scales. For this purpose, we present a scalable fitting algorithm with significantly improved performance for two-dimensional Gaussian fits as compared to Gpufit.

النظم الموزعة والتوازية والحوسبة العنقودية التحليل العددي التحليل العددي

Discontinuous Galerkin methods on graphics processing units for nonlinear hyperbolic conservation laws

115 - Martin Fuhry , Andrew Giuliani , Lilia Krivodonova 2016

We present a novel implementation of the modal discontinuous Galerkin (DG) method for hyperbolic conservation laws in two dimensions on graphics processing units (GPUs) using NVIDIAs Compute Unified Device Architecture (CUDA). Both flexible and highl y accurate, DG methods accommodate parallel architectures well as their discontinuous nature produces element-local approximations. High performance scientific computing suits GPUs well, as these powerful, massively parallel, cost-effective devices have recently included support for double-precision floating point numbers. Computed examples for Euler equations over unstructured triangle meshes demonstrate the effectiveness of our implementation on an NVIDIA GTX 580 device. Profiling of our method reveals performance comparable to an existing nodal DG-GPU implementation for linear problems.

النظم الموزعة والتوازية والحوسبة العنقودية البرمجيات الرياضية الفيزياء الحسابية

Accelerated molecular dynamics force evaluation on graphics processing units for thermal conductivity calculations

183 - Zheyong Fan , Topi Siro , Ari Harju 2012

In this paper, we develop a highly efficient molecular dynamics code fully implemented on graphics processing units for thermal conductivity calculations using the Green-Kubo formula. We compare two different schemes for force evaluation, a previousl y used thread-scheme where a single thread is used for one particle and each thread calculates the total force for the corresponding particle, and a new block-scheme where a whole block is used for one particle and each thread in the block calculates one or several pair forces between the particle associated with the given block and its neighbor particle(s) associated with the given thread. For both schemes, two different classical potentials, namely, the Lennard-Jones potential and the rigid-ion potential are implemented. While the thread-scheme performs a little better for relatively large systems, the block-scheme performs much better for relatively small systems. The relative performance of the block-scheme over the thread-scheme also increases with the increasing of the cutoff radius. We validate the implementation by calculating lattice thermal conductivities of solid argon and lead telluride.

علم المواد الفيزياء الحسابية

سجل دخول لتتمكن من نشر تعليقات