CosmoHub: Interactive exploration and distribution of astronomical data on Hadoop

487 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Pau Tallada Cresp\\'i

تاريخ النشر 2020

مجال البحث فيزياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Pau Tallada - Jorge Carretero - Jordi Casals

الأجهزة والأساليب للزيئات الفيزياء الفلكية النظم الموزعة والتوازية والحوسبة العنقودية تحليل البيانات والإحصاءات والاحتمال

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present CosmoHub (https://cosmohub.pic.es), a web application based on Hadoop to perform interactive exploration and distribution of massive cosmological datasets. Recent Cosmology seeks to unveil the nature of both dark matter and dark energy mapping the large-scale structure of the Universe, through the analysis of massive amounts of astronomical data, progressively increasing during the last (and future) decades with the digitization and automation of the experimental techniques. CosmoHub, hosted and developed at the Port dInformacio Cientifica (PIC), provides support to a worldwide community of scientists, without requiring the end user to know any Structured Query Language (SQL). It is serving data of several large international collaborations such as the Euclid space mission, the Dark Energy Survey (DES), the Physics of the Accelerating Universe Survey (PAUS) and the Marenostrum Institut de Ci`encies de lEspai (MICE) numerical simulations. While originally developed as a PostgreSQL relational database web frontend, this work describes the current version of CosmoHub, built on top of Apache Hive, which facilitates scalable reading, writing and managing huge datasets. As CosmoHubs datasets are seldomly modified, Hive it is a better fit. Over 60 TiB of catalogued information and $50 times 10^9$ astronomical objects can be interactively explored using an integrated visualization tool which includes 1D histogram and 2D heatmap plots. In our current implementation, online exploration of datasets of $10^9$ objects can be done in a timescale of tens of seconds. Users can also download customized subsets of data in standard formats generated in few minutes.

قيم البحث

116 - Juan B. Cabral , Bruno Sanchez , Martin Beroiz 2017

Data processing pipelines represent an important slice of the astronomical software library that include chains of processes that transform raw data into valuable information via data reduction and analysis. In this work we present Corral, a Python f ramework for astronomical pipeline generation. Corral features a Model-View-Controller design pattern on top of an SQL Relational Database capable of handling: custom data models; processing stages; and communication alerts, and also provides automatic quality and structural metrics based on unit testing. The Model-View-Controller provides concept separation between the user logic and the data models, delivering at the same time multi-processing and distributed computing capabilities. Corral represents an improvement over commonly found data processing pipelines in Astronomy since the design pattern eases the programmer from dealing with processing flow and parallelization issues, allowing them to focus on the specific algorithms needed for the successive data transformations and at the same time provides a broad measure of quality over the created pipeline. Corral and working examples of pipelines that use it are available to the community at https://github.com/toros-astro.

الأجهزة والأساليب للزيئات الفيزياء الفلكية هندسة البرمجيات تحليل البيانات والإحصاءات والاحتمال

Pre-feasibility Study of Astronomical Data Archive Systems Powered by Public Cloud Computing and Hadoop Hive

73 - Satoshi Eguchi 2016

The size of astronomical observational data is increasing yearly. For example, while Atacama Large Millimeter/submillimeter Array is expected to generate 200 TB raw data every year, Large Synoptic Survey Telescope is estimated to produce 15 TB raw da ta every night. Since the increasing rate of computing is much lower than that of astronomical data, to provide high performance computing (HPC) resources together with scientific data will be common in the next decade. However, the installation and maintenance costs of a HPC system can be burdensome for the provider. I note public cloud computing for an alternative way to get sufficient computing resources inexpensively. I build Hadoop and Hive clusters by utilizing a virtual private server (VPS) service and Amazon Elastic MapReduce (EMR), and measure their performances. The VPS cluster behaves differently day by day, while the EMR clusters are relatively stable. Since partitioning is essential for Hive, several partitioning algorithms are evaluated. In this paper, I report the results of the benchmarks and the performance optimizations in cloud computing environment.

الأجهزة والأساليب للزيئات الفيزياء الفلكية

Tera-scale Astronomical Data Analysis and Visualization

684 - A. H. Hassan , C. J. Fluke , D. G. Barnes 2012

We present a high-performance, graphics processing unit (GPU)-based framework for the efficient analysis and visualization of (nearly) terabyte (TB)-sized 3-dimensional images. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image: (1) volume rendering using an arbitrary transfer function at 7--10 frames per second; (2) computation of basic global image statistics such as the mean intensity and standard deviation in 1.7 s; (3) evaluation of the image histogram in 4 s; and (4) evaluation of the global image median intensity in just 45 s. Our measured results correspond to a raw computational throughput approaching one teravoxel per second, and are 10--100 times faster than the best possible performance with traditional single-node, multi-core CPU implementations. A scalability analysis shows the framework will scale well to images sized 1 TB and beyond. Other parallel data analysis algorithms can be added to the framework with relative ease, and accordingly, we present our framework as a possible solution to the image analysis and visualization requirements of next-generation telescopes, including the forthcoming Square Kilometre Array pathfinder radiotelescopes.

الأجهزة والأساليب للزيئات الفيزياء الفلكية النظم الموزعة والتوازية والحوسبة العنقودية الرسم الحاسوبي

AstronomicAL: An interactive dashboard for visualisation, integration and classification of data using Active Learning

90 - Grant Stevens , Sotiria Fotopoulou , Malcolm N. Bremer 2021

AstronomicAL is a human-in-the-loop interactive labelling and training dashboard that allows users to create reliable datasets and robust classifiers using active learning. This technique prioritises data that offer high information gain, leading to improved performance using substantially less data. The system allows users to visualise and integrate data from different sources and deal with incorrect or missing labels and imbalanced class sizes. AstronomicAL enables experts to visualise domain-specific plots and key information relating both to broader context and details of a point of interest drawn from a variety of data sources, ensuring reliable labels. In addition, AstronomicAL provides functionality to explore all aspects of the training process, including custom models and query strategies. This makes the software a tool for experimenting with both domain-specific classifications and more general-purpose machine learning strategies. We illustrate using the system with an astronomical dataset due to the fields immediate need; however, AstronomicAL has been designed for datasets from any discipline. Finally, by exporting a simple configuration file, entire layouts, models, and assigned labels can be shared with the community. This allows for complete transparency and ensures that the process of reproducing results is effortless

الأجهزة والأساليب للزيئات الفيزياء الفلكية تفاعل الإنسان والحاسوب التعلم الآلي

Historical astronomical data: urgent need for preservation, digitization enabling scientific exploration

98 - Alexei Pevtsov , Elizabeth Griffin , Jonathan Grindlay 2019

Over the past decades and even centuries, the astronomical community has accumulated a signif-icant heritage of recorded observations of a great many astronomical objects. Those records con-tain irreplaceable information about long-term evolutionary and non-evolutionary changes in our Universe, and their preservation and digitization is vital. Unfortunately, most of those data risk becoming degraded and thence totally lost. We hereby call upon the astronomical community and US funding agencies to recognize the gravity of the situation, and to commit to an interna-tional preservation and digitization efforts through comprehensive long-term planning supported by adequate resources, prioritizing where the expected scientific gains, vulnerability of the origi-nals and availability of relevant infrastructure so dictates. The importance and urgency of this issue has been recognized recently by General Assembly XXX of the International Astronomical Union (IAU) in its Resolution B3: on preservation, digitization and scientific exploration of his-torical astronomical data. We outline the rationale of this promotion, provide examples of new science through successful recovery efforts, and review the potential losses to science if nothing it done.

الأجهزة والأساليب للزيئات الفيزياء الفلكية الفيزياء الفلكية الشمسية والنجوم

سجل دخول لتتمكن من نشر تعليقات