ﻻ يوجد ملخص باللغة العربية
Practical aperture synthesis imaging algorithms work by iterating between estimating the sky brightness distribution and a comparison of a prediction based on this estimate with the measured data (visibilities). Accuracy in the latter step is crucial but is made difficult by irregular and non-planar sampling of data by the telescope. In this work we present a GPU implementation of 3d de-gridding which accurately deals with these two difficulties and is designed for distributed operation. We address the load balancing issues caused by large variation in visibilities that need to be computed. Using CUDA and NVidia GPUs we measure performance up to 1.2 billion visibilities per second.
Conventional GPU implementations of Strassens algorithm (Strassen) typically rely on the existing high-performance matrix multiplication (GEMM), trading space for time. As a result, such approaches can only achieve practical speedup for relatively la
While most robotics simulation libraries are built for low-dimensional and intrinsically serial tasks, soft-body and multi-agent robotics have created a demand for simulation environments that can model many interacting bodies in parallel. Despite th
Advanced Driver Assistance Systems (ADAS) and Autonomous Driving (AD) bring unprecedented performance requirements for automotive systems. Graphic Processing Unit (GPU) based platforms have been deployed with the aim of meeting these requirements, be
With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effo
We introduce hankl, a lightweight Python implementation of the FFTLog algorithm for Cosmology. The FFTLog algorithm is an extension of the Fast Fourier Transform (FFT) for logarithmically spaced periodic sequences. It can be used to efficiently compu