No Arabic abstract
The increasing volumes of astronomical data require practical methods for data exploration, access and visualisation. The Hierarchical Progressive Survey (HiPS) is a HEALPix based scheme that enables a multi-resolution approach to astronomy data from the individual pixels up to the whole sky. We highlight the decisions and approaches that have been taken to make this scheme a practical solution for managing large volumes of heterogeneous data. Early implementors of this system have formed a network of HiPS nodes, with some 250 diverse data sets currently available, with multiple mirror implementations for important data sets. This hierarchical approach can be adapted to expose Big Data in different ways. We describe how the ease of implementation, and local customisation of the Aladin Lite embeddable HiPS visualiser have been keys for promoting collaboration on HiPS.
The LSST survey was designed to deliver transformative results for four primary objectives: constraining dark energy and dark matter, taking an inventory of the Solar System, exploring the transient optical sky, and mapping the Milky Way. While the LSST Wide-Fast-Deep survey and accompanying Deep Drilling and mini-surveys will be ground-breaking for each of these areas, there remain competing demands on the survey area, depth, and temporal coverage amid a desire to maximize all three. In this white paper, we seek to address a principal source of tension between the different LSST science collaborations, that of the survey area and depth that they each need in the parts of the sky that they care about. We present simple tools which can be used to explore trades between the area surveyed by LSST and the number of visits available per field and then use these tools to propose a change to the baseline survey strategy. Specifically, we propose to reconfigure the WFD footprint to consist of low-extinction regions (limited by galactic latitude), with the number of visits per field in WFD limited by the LSST Science Requirements Document (SRD) design goal, and suggest assignment of the remaining LSST visits to the full visible LSST sky. This proposal addresses concerns with the WFD footprint raised by the DESC (as 25 percent of the current baseline WFD region is not usable for dark energy science due to MW dust extinction), eases the time required for the NES and SCP mini-surveys (since in our proposal they would partially fall into the modified WFD footprint), raises the number of visits previously assigned to the GP region, and increases the overlap with DESI and other Northern hemisphere follow-up facilities. This proposal alleviates many of the current concerns of Science Collaborations that represent the four scientific pillars of LSST and provides a Big Sky approach to cadence diplomacy.
Astronomy is entering in a new era of Extreme Intensive Data Computation and we have identified three major issues the new generation of projects have to face: Resource optimization, Heterogeneous Software Ecosystem and Data Transfer. We propose in this article a middleware solution offering a very modular and maintainable system for data analysis. As computations must be designed and described by specialists in astronomy, we aim at defining a friendly specific programming language to enable coding of astrophysical problems abstracted from any computer science specific issues. This way we expect substantial benefits in computing capabilities in data analysis. As a first development using our solution, we propose a cross-matching service for the Taiwan Extragalactic Astronomical Data Center.
Photometric redshifts (photo-zs) are fundamental in galaxy surveys to address different topics, from gravitational lensing and dark matter distribution to galaxy evolution. The Kilo Degree Survey (KiDS), i.e. the ESO public survey on the VLT Survey Telescope (VST), provides the unprecedented opportunity to exploit a large galaxy dataset with an exceptional image quality and depth in the optical wavebands. Using a KiDS subset of about 25,000 galaxies with measured spectroscopic redshifts, we have derived photo-zs using i) three different empirical methods based on supervised machine learning, ii) the Bayesian Photometric Redshift model (or BPZ), and iii) a classical SED template fitting procedure (Le Phare). We confirm that, in the regions of the photometric parameter space properly sampled by the spectroscopic templates, machine learning methods provide better redshift estimates, with a lower scatter and a smaller fraction of outliers. SED fitting techniques, however, provide useful information on the galaxy spectral type which can be effectively used to constrain systematic errors and to better characterize potential catastrophic outliers. Such classification is then used to specialize the training of regression machine learning models, by demonstrating that a hybrid approach, involving SED fitting and machine learning in a single collaborative framework, can be effectively used to improve the accuracy of photo-z estimates.
As current- and next-generation astronomical instruments come online, they will generate an unprecedented deluge of data. Analyzing these data in real time presents unique conceptual and computational challenges, and their long-term storage and archiving is scientifically essential for generating reliable, reproducible results. We present here the real-time processing (RTP) system for the Hydrogen Epoch of Reionization Array (HERA), a radio interferometer endeavoring to provide the first detection of the highly redshifted 21 cm signal from Cosmic Dawn and the Epoch of Reionization by an interferometer. The RTP system consists of analysis routines run on raw data shortly after they are acquired, such as calibration and detection of radio-frequency interference (RFI) events. RTP works closely with the Librarian, the HERA data storage and transfer manager which automatically ingests data and transfers copies to other clusters for post-processing analysis. Both the RTP system and the Librarian are public and open source software, which allows for them to be modified for use in other scientific collaborations. When fully constructed, HERA is projected to generate over 50 terabytes (TB) of data each night, and the RTP system enables the successful scientific analysis of these data.
This chapter introduces the state-of-the-art in the emerging area of combining High Performance Computing (HPC) with Big Data Analysis. To understand the new area, the chapter first surveys the existing approaches to integrating HPC with Big Data. Next, the chapter introduces several optimization solutions that focus on how to minimize the data transfer time from computation-intensive applications to analysis-intensive applications as well as minimizing the end-to-end time-to-solution. The solutions utilize SDN to adaptively use both high speed interconnect network and high performance parallel file systems to optimize the application performance. A computational framework called DataBroker is designed and developed to enable a tight integration of HPC with data analysis. Multiple types of experiments have been conducted to show different performance issues in both message passing and parallel file systems and to verify the effectiveness of the proposed research approaches.