VINE -- A numerical code for simulating astrophysical systems using particles II: Implementation and performance characteristics

37 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Andy Nelson

تاريخ النشر 2009

مجال البحث فيزياء

والبحث باللغة English

تأليف Andrew F. Nelson

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We continue our presentation of VINE. We begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the code itself. We continue with a detailed description of a number of optimizations made to the layout of the particle data in memory and to our implementation of a binary tree used to access that data for use in gravitational force calculations and searches for SPH neighbor particles. We describe modifications to the code necessary to obtain forces efficiently from special purpose `GRAPE hardware. We conclude with an extensive series of performance tests, which demonstrate that the code can be run efficiently and without modification in serial on small workstations or in parallel using OpenMP compiler directives on large scale, shared memory parallel machines. We analyze the effects of the code optimizations and estimate that they improve its overall performance by more than an order of magnitude over that obtained by many other tree codes. Scaled parallel performance of the gravity and SPH calculations, together the most costly components of most simulations, is nearly linear up to maximum machine sizes available to us (120 processors on an Origin~3000). At similar accuracy, performance of VINE, used in GRAPE-tree mode, is approximately a factor two slower than that of VINE, used in host-only mode. Optimizations of the GRAPE/host communications could improve the speed by as much as a factor of three, but have not yet been implemented in VINE.

قيم البحث

36 - M. Wetzstein 2009

We present a Fortran 95 code for simulating the evolution of astrophysical systems using particles to represent the underlying fluid flow. The code is designed to be versatile, flexible and extensible, with modular options that can be selected either at compile time or at run time. We include a number of general purpose modules describing a variety of physical processes commonly required in the astrophysical community. The code can be used as an N-body code to evolve a set of particles in two or three dimensions using either a Leapfrog or Runge-Kutta-Fehlberg integrator, with or without individual timesteps for each particle. Particles may interact gravitationally as $N$-body particles, and all or any subset may also interact hydrodynamically, using the Smoothed Particle Hydrodynamic (SPH) method. Massive point particles (`stars) which may accrete nearby SPH or $N$-body particles may also be included. The default free boundary conditions can be replaced by a module to include periodic boundaries. Cosmological expansion may also be included. An interface with special purpose `GRAPE hardware may also be selected. If available, forces obtained from the GRAPE coprocessors may be transparently substituted for those obtained from the default tree based calculation. The code may be run without modification on single processors or in parallel using OpenMP compiler directives on large scale, shared memory parallel machines. In comparison to the Gadget-2 code of Springel 2005, the gravitational force calculation is $approx 3.5 - 4.8$ times faster with VINE when run on 8 Itanium~2 processors in an SGI Altix, while producing nearly identical outcomes in our test problems. We present simulations of several test problems, including a merger simulation of two elliptical galaxies with 800000 particles.

Simulating Turbulence Using the Astrophysical Discontinuous Galerkin Code TENET

219 - Andreas Bauer , Kevin Schaal , Volker Springel 2016

In astrophysics, the two main methods traditionally in use for solving the Euler equations of ideal fluid dynamics are smoothed particle hydrodynamics and finite volume discretization on a stationary mesh. However, the goal to efficiently make use of future exascale machines with their ever higher degree of parallel concurrency motivates the search for more efficient and more accurate techniques for computing hydrodynamics. Discontinuous Galerkin (DG) methods represent a promising class of methods in this regard, as they can be straightforwardly extended to arbitrarily high order while requiring only small stencils. Especially for applications involving comparatively smooth problems, higher-order approaches promise significant gains in computational speed for reaching a desired target accuracy. Here, we introduce our new astrophysical DG code TENET designed for applications in cosmology, and discuss our first results for 3D simulations of subsonic turbulence. We show that our new DG implementation provides accurate results for subsonic turbulence, at considerably reduced computational cost compared with traditional finite volume methods. In particular, we find that DG needs about 1.8 times fewer degrees of freedom to achieve the same accuracy and at the same time is more than 1.5 times faster, confirming its substantial promise for astrophysical applications.

الأجهزة والأساليب للزيئات الفيزياء الفلكية علم الكونيات والفيزياء الفلكية Nongalactic

Morphology and numerical characteristics of epidemic curves for SARS-Cov-II using Moyal distribution

59 - Jose de Jesus Bernal-Alvarado , David Delepine 2020

In this paper, it is shown that the Moyal distribution is an excelent tool to study the SARS-Cov-II (Covid-19) epidemiological associated curves and its propagation. The Moyal parameters give all the information to describe the form and the impact of the illness outbreak in the different affected countries and its global impact. We checked that the Moyal distribution can accurately fit the daily report of {it{new confirmed cases of infected people}} (NCC) per country, in that places where the contagion is reaching their final phase, describing the beginning, the most intense phase and the descend of the contagion, simultaneously . In order to achieve the purpose of this work, it is important to work with a complete and well compilated set of the data to be used to fit the curves. Data from European countries like France, Spain, Italy Belgium, Sweden, United Kingdom, Denmark and others like USA and China, have been used. Also, the correlation between the parameters of the Moyal distribution fitting and the general public health measures imposed in each country, have been discussed. A relation between those policies and the features of the Moyal distribution, in terms of their parameters and critical points, is shown; from that, it can be seen that the knowledge of the time evolution of the epidemiological curve, their critical points, superposition properties and rates of the rising and the ending, could help to find a way to estimate the efficiency of social distancing measures, imposed in each country, and anticipate the different phases of the pandemic.

السكان والتطور الفيزياء الطبية الفيزياء والمجتمع

A Parallel Monte Carlo Code for Simulating Collisional N-body Systems

462 - Bharath Pattabiraman , Stefan Umbreit , Wei-Keng Liao 2012

We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of sph erical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. The implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find that our results are in good agreement with self-similar core-collapse solutions, and the core collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within less than 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7. The runtime reaches a saturation with the addition of more processors beyond these limits which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60x, 100x, and 220x, respectively.

الأجهزة والأساليب للزيئات الفيزياء الفلكية الفيزياء الفلكية من المجرات الفيزياء الحسابية

On Error-Correction Performance and Implementation of Polar Code List Decoders for 5G

69 - Furkan Ercan , Carlo Condo , Seyyed Ali Hashemi 2017

Polar codes are a class of capacity achieving error correcting codes that has been recently selected for the next generation of wireless communication standards (5G). Polar code decoding algorithms have evolved in various directions, striking differe nt balances between error-correction performance, speed and complexity. Successive-cancellation list (SCL) and its incarnations constitute a powerful, well-studied set of algorithms, in constant improvement. At the same time, different implementation approaches provide a wide range of area occupations and latency results. 5G puts a focus on improved error-correction performance, high throughput and low power consumption: a comprehensive study considering all these metrics is currently lacking in literature. In this work, we evaluate SCL-based decoding algorithms in terms of error-correction performance and compare them to low-density parity-check (LDPC) codes. Moreover, we consider various decoder implementations, for both polar and LDPC codes, and compare their area occupation and power and energy consumption when targeting short code lengths and rates. Our work shows that among SCL-based decoders, the partitioned SCL (PSCL) provides the lowest area occupation and power consumption, whereas fast simplified SCL (Fast-SSCL) yields the lowest energy consumption. Compared to LDPC decoder architectures, different SCL implementations occupy up to 17.1x less area, dissipate up to 7.35x less power, and up to 26x less energy.

نظرية المعلومات نظرية المعلومات

سجل دخول لتتمكن من نشر تعليقات