CUDA Support in GNA Data Analysis Framework

174 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Maxim Gonchar

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Anna Fatkina - Maxim Gonchar - Liudmila Kolupaeva

النظم الموزعة والتوازية والحوسبة العنقودية الهندسة الحاسوبية، المالية،العلوم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Usage of GPUs as co-processors is a well-established approach to accelerate costly algorithms operating on matrices and vectors. We aim to further improve the performance of the Global Neutrino Analysis framework (GNA) by adding GPU support in a way that is transparent to the end user. To achieve our goal we use CUDA, a state of the art technology providing GPGPU programming methods. In this paper we describe new features of GNA related to CUDA support. Some specific framework features that influence GPGPU integration are also explained. The paper investigates the feasibility of GPU technology application and shows an example of the achieved acceleration of an algorithm implemented within framework. Benchmarks show a significant performance increase when using GPU transformations. The project is currently in the developmental phase. Our plans include implementation of the set of transformations necessary for the data analysis in the GNA framework and tests of the GPU expediency in the complete analysis chain.

قيم البحث

65 - Anna Fatkina , Maxim Gonchar , Anastasia Kalitkina 2019

We report on the status of GNA --- a new framework for fitting large-scale physical models. GNA utilizes the data flow concept within which a model is represented by a directed acyclic graph. Each node is an operation on an array (matrix multiplicati on, derivative or cross section calculation, etc). The framework enables the user to create flexible and efficient large-scale lazily evaluated models, handle large numbers of parameters, propagate parameters uncertainties while taking into account possible correlations between them, fit models, and perform statistical analysis. The main goal of the paper is to give an overview of the main concepts and methods as well as reasons behind their design. Detailed technical information is to be published in further works.

البرمجيات الرياضية

Fast GPGPU Data Rearrangement Kernels using CUDA

496 - Michael Bader , Hans-Joachim Bungartz , Dheevatsa Mudigere 2010

Many high performance-computing algorithms are bandwidth limited, hence the need for optimal data rearrangement kernels as well as their easy integration into the rest of the application. In this work, we have built a CUDA library of fast kernels for a set of data rearrangement operations. In particular, we have built generic kernels for rearranging m dimensional data into n dimensions, including Permute, Reorder, Interlace/De-interlace, etc. We have also built kernels for generic Stencil computations on a two-dimensional data using templates and functors that allow application developers to rapidly build customized high performance kernels. All the kernels built achieve or surpass best-known performance in terms of bandwidth utilization.

النظم الموزعة والتوازية والحوسبة العنقودية الرسم الحاسوبي الأداء

The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience

368 - Randal Burns , William Gray Roncal , Dean Kleissas 2013

We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed pri marily for workloads that build connectomes---neural connectivity maps of the brain---using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at http://openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems---reads to parallel disk arrays and writes to solid-state storage---to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effectiveness of spatial data organization.

النظم الموزعة والتوازية والحوسبة العنقودية الهندسة الحاسوبية، المالية،العلوم الخلايا العصبية والإدراك

waLBerla: A block-structured high-performance framework for multiphysics simulations

317 - Martin Bauer , Sebastian Eibl , Christian Godenschwager 2019

Programming current supercomputers efficiently is a challenging task. Multiple levels of parallelism on the core, on the compute node, and between nodes need to be exploited to make full use of the system. Heterogeneous hardware architectures with ac celerators further complicate the development process. waLBerla addresses these challenges by providing the user with highly efficient building blocks for developing simulations on block-structured grids. The block-structured domain partitioning is flexible enough to handle complex geometries, while the structured grid within each block allows for highly efficient implementations of stencil-based algorithms. We present several example applications realized with waLBerla, ranging from lattice Boltzmann methods to rigid particle simulations. Most importantly, these methods can be coupled together, enabling multiphysics simulations. The framework uses meta-programming techniques to generate highly efficient code for CPUs and GPUs from a symbolic method formulation. To ensure software quality and performance portability, a continuous integration toolchain automatically runs an extensive test suite encompassing multiple compilers, hardware architectures, and software configurations.

النظم الموزعة والتوازية والحوسبة العنقودية الهندسة الحاسوبية، المالية،العلوم الفيزياء الحسابية

Semantic interoperability and characterization of data provenance in computational molecular engineering

296 - M. T. Horsch , C. Niethammer , G. Boccardo 2019

By introducing a common representational system for metadata that describe the employed simulation workflows, diverse sources of data and platforms in computational molecular engineering, such as workflow management systems, can become interoperable at the semantic level. To achieve semantic interoperability, the present work introduces two ontologies that provide a formal specification of the entities occurring in a simulation workflow and the relations between them: The software ontology VISO is developed to represent software packages and their features, and OSMO, an ontology for simulation, modelling, and optimization, is introduced on the basis of MODA, a previously developed semi-intuitive graph notation for workflows in materials modelling. As a proof of concept, OSMO is employed to describe a use case of the TaLPas workflow management system, a scheduler and workflow optimizer for particle-based simulations.

النظم الموزعة والتوازية والحوسبة العنقودية الهندسة الحاسوبية، المالية،العلوم هندسة البرمجيات

سجل دخول لتتمكن من نشر تعليقات