A Real Time Processing System for Big Data in Astronomy: Applications to HERA

56 0 0.0 ( 0 )

Download Cite

Added by Paul La Plante

Publication date 2021

fields Physics

and research's language is English

Authors Paul La Plante - Peter K. G. Williams - Matthew Kolopanis

Instrumentation and Methods for Astrophysics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

As current- and next-generation astronomical instruments come online, they will generate an unprecedented deluge of data. Analyzing these data in real time presents unique conceptual and computational challenges, and their long-term storage and archiving is scientifically essential for generating reliable, reproducible results. We present here the real-time processing (RTP) system for the Hydrogen Epoch of Reionization Array (HERA), a radio interferometer endeavoring to provide the first detection of the highly redshifted 21 cm signal from Cosmic Dawn and the Epoch of Reionization by an interferometer. The RTP system consists of analysis routines run on raw data shortly after they are acquired, such as calibration and detection of radio-frequency interference (RFI) events. RTP works closely with the Librarian, the HERA data storage and transfer manager which automatically ingests data and transfers copies to other clusters for post-processing analysis. Both the RTP system and the Librarian are public and open source software, which allows for them to be modified for use in other scientific collaborations. When fully constructed, HERA is projected to generate over 50 terabytes (TB) of data each night, and the RTP system enables the successful scientific analysis of these data.

rate research

Real-time stream processing in radio astronomy

102 - Danny C. Price 2019

A major challenge in modern radio astronomy is dealing with the massive data volumes generated by wide-bandwidth receivers. Such massive data rates are often too great for a single device to cope, and so processing must be split across multiple devices working in parallel. These devices must work in unison to process incoming data in real time, reduce the data volume to a manageable size, and output a science-ready data product. The aim of this chapter is to give a broad overview of how digital systems for radio telescopes are commonly implemented, with a focus on real-time stream processing over multiple compute devices.

Instrumentation and Methods for Astrophysics

Nanosurveyor: a framework for real-time data processing

60 - Benedikt J. Daurer , Hari Krishnan , Talita Perciano 2016

Scientists are drawn to synchrotrons and accelerator based light sources because of their brightness, coherence and flux. The rate of improvement in brightness and detector technology has outpaced Moores law growth seen for computers, networks, and storage, and is enabling novel observations and discoveries with faster frame rates, larger fields of view, higher resolution, and higher dimensionality. Here we present an integrated software/algorithmic framework designed to capitalize on high throughput experiments, and describe the streamlined processing pipeline of ptychography data analysis. The pipeline provides throughput, compression, and resolution as well as rapid feedback to the microscope operators.

Instrumentation and Detectors Mathematical Software Data Analysis Statistics and Probability

A Data-Taking System for Planetary Radar Applications

85 - J.L. Margot 2021

Most planetary radar applications require recording of complex voltages at sampling rates of up to 20 MHz. I describe the design and implementation of a sampling system that has been installed at the Arecibo Observatory, Goldstone Solar System Radar, and Green Bank Telescope. After many years of operation, these data-taking systems have enabled the acquisition of hundreds of data sets, many of which still await publication.

Instrumentation and Methods for Astrophysics Earth and Planetary Astrophysics

Stream Processing for Solar Physics: Applications and Implications for Big Solar Data

171 - Karl Battams 2014

Modern advances in space technology have enabled the capture and recording of unprecedented volumes of data. In the field of solar physics this is most readily apparent with the advent of the Solar Dynamics Observatory (SDO), which returns in excess of 1 terabyte of data daily. While we now have sufficient capability to capture, transmit and store this information, the solar physics community now faces the new challenge of analysis and mining of high-volume and potentially boundless data sets such as this: a task known to the computer science community as stream mining. In this paper, we survey existing and established stream mining methods in the context of solar physics, with a goal of providing an introductory overview of stream mining algorithms employed by the computer science fields. We consider key concepts surrounding stream mining that are applicable to solar physics, outlining existing algorithms developed to address this problem in other fields of study, and discuss their applicability to massive solar data sets. We also discuss the considerations and trade-offs that may need to be made when applying stream mining methods to solar data. We find that while no one single solution is readily available, many of the methods now employed in other data streaming applications could successfully be modified to apply to solar data and prove invaluable for successful analysis and mining of this new source.

Space Physics Instrumentation and Methods for Astrophysics

CANFAR+Skytree: A Cloud Computing and Data Mining System for Astronomy

525 - Nicholas M. Ball 2013

At the Canadian Astronomy Data Centre, we have combined our cloud computing system, CANFAR, with the worlds most advanced machine learning software, Skytree, to create the worlds first cloud computing system for data mining in astronomy. CANFAR provides a generic environment for the storage and processing of large datasets, removing the requirement to set up and maintain a computing system when implementing an extensive undertaking such as a survey pipeline. 500 processor cores and several hundred terabytes of persistent storage are currently available to users. The storage is implemented via the International Virtual Observatory Alliances VOSpace protocol, and is accessible both interactively, and to all processing jobs. The user interacts with CANFAR by utilizing virtual machines, which appear to them as equivalent to a desktop. Each machine is replicated as desired to perform large-scale parallel processing. Such an arrangement enables the user to immediately install and run the same astronomy code that they already utilize, in the same way as on a desktop. In addition, unlike many cloud systems, batch job scheduling is handled for the user on multiple virtual machines by the Condor job queueing system. Skytree is installed and run just as any other software on the system, and thus acts as a library of command line data mining functions that can be integrated into ones wider analysis. Thus we have created a generic environment for large-scale analysis by data mining, in the same way that CANFAR itself has done for storage and processing. Because Skytree scales to large data in linear runtime, this allows the full sophistication of the huge fields of data mining and machine learning to be applied to the hundreds of millions of objects that make up current large datasets. We demonstrate the utility of the CANFAR+Skytree system by showing science results obtained. [Abridged]

Instrumentation and Methods for Astrophysics