On the improvement of the in-place merge algorithm parallelization

62 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Berenger Bramas

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Berenger Bramas

النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we present several improvements in the parallelization of the in-place merge algorithm, which merges two contiguous sorted arrays into one with an O(T) space complexity (where T is the number of threads). The approach divides the two arrays into as many pairs of partitions as there are threads available; such that each thread can later merge a pair of partitions independently of the others. We extend the existing method by proposing a new algorithm to find the median of two partitions. Additionally, we provide a new strategy to divide the input arrays where we minimize the data movement, but at the cost of making this stage sequential. Finally, we provide the so-called linear shifting algorithm that swaps two partitions in-place with contiguous data access. We emphasize that our approach is straightforward to implement and that it can also be used for external (out of place) merging. The results demonstrate that it provides a significant speedup compared to sequential executions, when the size of the arrays is greater than a thousand elements.

قيم البحث

134 - Shashank Srikant 2010

The Extended Burrows Wheeler transform (EBWT) helps to find the distance between two sequences. Implementation of an existing algorithm takes considerable amount of time for small size sequences. In this paper, we give a parallel implementation of th is algorithm using NVIDIA Compute Unified Device Architecture (CUDA). We have obtained, on an average, a 2X improvement in the performance.

النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات

Correction to: A Practical, Provably Linear Time, In-place and Stable Merge Algorithm via the Perfect Shuffle

67 - John Ellis , Ulrike Stege 2020

We correct a paper previously submitted to CoRR. That paper claimed that the algorithm there described was provably of linear time complexity in the average case. The alleged proof of that statement contained an error, being based on an invalid assum ption, and is invalid. In this paper we present both experimental and analytical evidence that the time complexity is of order $N^2$ in the average case, where $N$ is the total length of the merged sequences.

بنى وهياكل البيانات والخوارزميات

Optimizing the hybrid parallelization of BHAC

145 - Salvatore Cielo , Oliver Porth , Luigi Iapichino 2021

We present our experience with the modernization on the GR-MHD code BHAC, aimed at improving its novel hybrid (MPI+OpenMP) parallelization scheme. In doing so, we showcase the use of performance profiling tools usable on x86 (Intel-based) architectur es. Our performance characterization and threading analysis provided guidance in improving the concurrency and thus the efficiency of the OpenMP parallel regions. We assess scaling and communication patterns in order to identify and alleviate MPI bottlenecks, with both runtime switches and precise code interventions. The performance of optimized version of BHAC improved by $sim28%$, making it viable for scaling on several hundreds of supercomputer nodes. We finally test whether porting such optimizations to different hardware is likewise beneficial on the new architecture by running on ARM A64FX vector nodes.

النظم الموزعة والتوازية والحوسبة العنقودية الأجهزة والأساليب للزيئات الفيزياء الفلكية

Parallelization of the multi-level hp-adaptive finite cell method

56 - John N. Jomo 2018

The multi-level hp-refinement scheme is a powerful extension of the finite element method that allows local mesh adaptation without the trouble of constraining hanging nodes. This is achieved through hierarchical high-order overlay meshes, a hp-schem e based on spatial refinement by superposition. An efficient parallelization of this method using standard domain decomposition approaches in combination with ghost elements faces the challenge of a large basis function support resulting from the overlay structure and is in many cases not feasible. In this contribution, a parallelization strategy for the multi-level hp-scheme is presented that is adapted to the schemes simple hierarchical structure. By distributing the computational domain among processes on the granularity of the active leaf elements and utilizing shared mesh data structures, good parallel performance is achieved, as redundant computations on ghost elements are avoided. We show the schemes parallel scalability for problems with a few hundred elements per process. Furthermore, the scheme is used in conjunction with the finite cell method to perform numerical simulations on domains of complex shape.

النظم الموزعة والتوازية والحوسبة العنقودية التحليل العددي الفيزياء الحسابية

On the policy improvement algorithm for ergodic risk-sensitive control

82 - Ari Arapostathis , Anup Biswas , 2019

In this article we consider the ergodic risk-sensitive control problem for a large class of multidimensional controlled diffusions on the whole space. We study the minimization and maximization problems under either a blanket stability hypothesis, or a near-monotone assumption on the running cost. We establish the convergence of the policy improvement algorithm for these models. We also present a more general result concerning the region of attraction of the equilibrium of the algorithm.

التحسين والتحكم الاحتمالات

سجل دخول لتتمكن من نشر تعليقات