GPU Kernels for High-Speed 4-Bit Astrophysical Data Processing

451 0 0.0 ( 0 )

Download Cite

Added by Peter Klages

Publication date 2015

fields Physics

and research's language is English

Authors Peter Klages

Instrumentation and Methods for Astrophysics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Interferometric radio telescopes often rely on computationally expensive O(N^2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded real and imaginary integers. Because of the low bit width of the data, two values may be packed into a 32-bit register, allowing multiplication and addition of more than one value with a single fused multiply-add instruction. With this data and calculation packing scheme, as many as 5.6 effective tera-operations per second (TOPS) can be executed on a 4.3 TOPS GPU. The kernel design allows correlations to scale to large numbers of input elements, limited only by maximum buffer sizes on the GPU. This code is currently working on-sky with the CHIME Pathfinder Correlator in BC, Canada.

rate research

A GPU Spatial Processing System for CHIME

574 - Nolan Denman , Andre Renard , Keith Vanderlinde 2020

We present an overview of the Graphics Processing Unit (GPU) based spatial processing system created for the Canadian Hydrogen Intensity Mapping Experiment (CHIME). The design employs AMD S9300x2 GPUs and readily-available commercial hardware in its processing nodes to provide a cost- and power-efficient processing substrate. These nodes are supported by a liquid-cooling system which allows continuous operation with modest power consumption and in all but the most adverse conditions. Capable of continuously correlating 2048 receiver-polarizations across 400,MHz of bandwidth, the CHIME X-engine constitutes the most powerful radio correlator currently in existence. It receives $6.6$,Tb/s of channelized data from CHIMEs FPGA-based F-engine, and the primary correlation task requires $8.39times10^{14}$ complex multiply-and-accumulate operations per second. The same system also provides formed-beam data products to commensal FRB and Pulsar experiments; it constitutes a general spatial-processing system of unprecedented scale and capability, with correspondingly great challenges in computation, data transport, heat dissipation, and interference shielding.

Instrumentation and Methods for Astrophysics

A High-Throughput Solver for Marginalized Graph Kernels on GPU

77 - Yu-Hang Tang , Oguz Selvitopi , Doru Popovici 2019

We present the design and optimization of a linear solver on General Purpose GPUs for the efficient and high-throughput evaluation of the marginalized graph kernel between pairs of labeled graphs. The solver implements a preconditioned conjugate gradient (PCG) method to compute the solution to a generalized Laplacian equation associated with the tensor product of two graphs. To cope with the gap between the instruction throughput and the memory bandwidth of current generation GPUs, our solver forms the tensor product linear system on-the-fly without storing it in memory when performing matrix-vector dot product operations in PCG. Such on-the-fly computation is accomplished by using threads in a warp to cooperatively stream the adjacency and edge label matrices of individual graphs by small square matrix blocks called tiles, which are then staged in registers and the shared memory for later reuse. Warps across a thread block can further share tiles via the shared memory to increase data reuse. We exploit the sparsity of the graphs hierarchically by storing only non-empty tiles using a coordinate format and nonzero elements within each tile using bitmaps. Besides, we propose a new partition-based reordering algorithm for aggregating nonzero elements of the graphs into fewer but denser tiles to improve the efficiency of the sparse format. We carry out extensive theoretical analyses on the graph tensor product primitives for tiles of various density and evaluate their performance on synthetic and real-world datasets. Our solver delivers three to four orders of magnitude speedup over existing CPU-based solvers such as GraKeL and GraphKernels. The capability of the solver enables kernel-based learning tasks at unprecedented scales.

Distributed Parallel and Cluster Computing Machine Learning Performance

The SPHERE data center: a reference for high contrast imaging processing

72 - Ph. Delorme , N. Meunier , D. Albert 2017

The objective of the SPHERE Data Center is to optimize the scientific return of SPHERE at the VLT, by providing optimized reduction procedures, services to users and publicly available reduced data. This paper describes our motivation, the implementation of the service (partners, infrastructure and developments), services, description of the on-line data, and future developments. The SPHERE Data Center is operational and has already provided reduced data with a good reactivity to many observers. The first public reduced data have been made available in 2017. The SPHERE Data Center is gathering a strong expertise on SPHERE data and is in a very good position to propose new reduced data in the future, as well as improved reduction procedures.

Instrumentation and Methods for Astrophysics

Data processing pipeline for Herschel HIFI

110 - R.F. Shipman , S. F. Beaulieu , D. Teyssier 2017

{Context}. The HIFI instrument on the Herschel Space Observatory performed over 9100 astronomical observations, almost 900 of which were calibration observations in the course of the nearly four-year Herschel mission. The data from each observation had to be converted from raw telemetry into calibrated products and were included in the Herschel Science Archive. {Aims}. The HIFI pipeline was designed to provide robust conversion from raw telemetry into calibrated data throughout all phases of the HIFI missions. Pre-launch laboratory testing was supported as were routine mission operations. {Methods}. A modular software design allowed components to be easily added, removed, amended and/or extended as the understanding of the HIFI data developed during and after mission operations. {Results}. The HIFI pipeline processed data from all HIFI observing modes within the Herschel automated processing environment as well as within an interactive environment. The same software can be used by the general astronomical community to reprocess any standard HIFI observation. The pipeline also recorded the consistency of processing results and provided automated quality reports. Many pipeline modules were in use since the HIFI pre-launch instrument level testing. {Conclusions}. Processing in steps facilitated data analysis to discover and address instrument artefacts and uncertainties. The availability of the same pipeline components from pre-launch throughout the mission made for well-understood, tested, and stable processing. A smooth transition from one phase to the next significantly enhanced processing reliability and robustness.

Instrumentation and Methods for Astrophysics

Data Processing Pipeline For Tianlai Experiment

121 - Shifan Zuo , Jixia Li , Yichao Li 2020

The Tianlai project is a 21cm intensity mapping experiment aimed at detecting dark energy by measuring the baryon acoustic oscillation (BAO) features in the large scale structure power spectrum. This experiment provides an opportunity to test the data processing methods for cosmological 21cm signal extraction, which is still a great challenge in current radio astronomy research. The 21cm signal is much weaker than the foregrounds and easily affected by the imperfections in the instrumental responses. Furthermore, processing the large volumes of interferometer data poses a practical challenge. We have developed a data processing pipeline software called {tt tlpipe} to process the drift scan survey data from the Tianlai experiment. It performs offline data processing tasks such as radio frequency interference (RFI) flagging, array calibration, binning, and map-making, etc. It also includes utility functions needed for the data analysis, such as data selection, transformation, visualization and others. A number of new algorithms are implemented, for example the eigenvector decomposition method for array calibration and the Tikhonov regularization for $m$-mode analysis. In this paper we describe the design and implementation of the {tt tlpipe} and illustrate its functions with some analysis of real data. Finally, we outline directions for future development of this publicly code.

Instrumentation and Methods for Astrophysics