ﻻ يوجد ملخص باللغة العربية
The Extended Burrows Wheeler transform (EBWT) helps to find the distance between two sequences. Implementation of an existing algorithm takes considerable amount of time for small size sequences. In this paper, we give a parallel implementation of this algorithm using NVIDIA Compute Unified Device Architecture (CUDA). We have obtained, on an average, a 2X improvement in the performance.
In this paper, we present several improvements in the parallelization of the in-place merge algorithm, which merges two contiguous sorted arrays into one with an O(T) space complexity (where T is the number of threads). The approach divides the two a
Maximum weight matching is one of the most fundamental combinatorial optimization problems with a wide range of applications in data mining and bioinformatics. Developing distributed weighted matching algorithms is challenging due to the sequential n
The state-of-the-art deep learning algorithms rely on distributed training systems to tackle the increasing sizes of models and training data sets. Minibatch stochastic gradient descent (SGD) algorithm requires workers to halt forward/back propagatio
We present our experience with the modernization on the GR-MHD code BHAC, aimed at improving its novel hybrid (MPI+OpenMP) parallelization scheme. In doing so, we showcase the use of performance profiling tools usable on x86 (Intel-based) architectur