ترغب بنشر مسار تعليمي؟ اضغط هنا

Corral Framework: Trustworthy and Fully Functional Data Intensive Parallel Astronomical Pipelines

117   0   0.0 ( 0 )
 نشر من قبل Bruno S\\'anchez
 تاريخ النشر 2017
والبحث باللغة English




اسأل ChatGPT حول البحث

Data processing pipelines represent an important slice of the astronomical software library that include chains of processes that transform raw data into valuable information via data reduction and analysis. In this work we present Corral, a Python framework for astronomical pipeline generation. Corral features a Model-View-Controller design pattern on top of an SQL Relational Database capable of handling: custom data models; processing stages; and communication alerts, and also provides automatic quality and structural metrics based on unit testing. The Model-View-Controller provides concept separation between the user logic and the data models, delivering at the same time multi-processing and distributed computing capabilities. Corral represents an improvement over commonly found data processing pipelines in Astronomy since the design pattern eases the programmer from dealing with processing flow and parallelization issues, allowing them to focus on the specific algorithms needed for the successive data transformations and at the same time provides a broad measure of quality over the created pipeline. Corral and working examples of pipelines that use it are available to the community at https://github.com/toros-astro.



قيم البحث

اقرأ أيضاً

In the multi-messenger era, astronomical projects share information about transients phenomena issuing science alerts to the Scientific Community through different communications networks. This coordination is mandatory to understand the nature of th ese physical phenomena. For this reason, astrophysical projects rely on real-time analysis software pipelines to identify as soon as possible transients (e.g. GRBs), and to speed up external alerts reaction time. These pipelines can share and receive the science alerts through the Gamma-ray Coordinates Network. This work presents a framework designed to simplify the development of real-time scientific analysis pipelines. The framework provides the architecture and the required automatisms to develop a real-time analysis pipeline, allowing the researchers to focus more on the scientific aspects. The framework has been successfully used to develop real-time pipelines for the scientific analysis of the AGILE space mission data. It is planned to reuse this framework for the Super-GRAWITA and AFISS projects. A possible future use for the Cherenkov Telescope Array (CTA) project is under evaluation.
Modern astronomical data processing requires complex software pipelines to process ever growing datasets. For radio astronomy, these pipelines have become so large that they need to be distributed across a computational cluster. This makes it difficu lt to monitor the performance of each pipeline step. To gain insight into the performance of each step, a performance monitoring utility needs to be integrated with the pipeline execution. In this work we have developed such a utility and integrated it with the calibration pipeline of the Low Frequency Array, LOFAR, a leading radio telescope. We tested the tool by running the pipeline on several different compute platforms and collected the performance data. Based on this data, we make well informed recommendations on future hardware and software upgrades. The aim of these upgrades is to accelerate the slowest processing steps for this LOFAR pipeline. The pipeline collector suite is open source and will be incorporated in future LOFAR pipelines to create a performance database for all LOFAR processing.
We present CosmoHub (https://cosmohub.pic.es), a web application based on Hadoop to perform interactive exploration and distribution of massive cosmological datasets. Recent Cosmology seeks to unveil the nature of both dark matter and dark energy map ping the large-scale structure of the Universe, through the analysis of massive amounts of astronomical data, progressively increasing during the last (and future) decades with the digitization and automation of the experimental techniques. CosmoHub, hosted and developed at the Port dInformacio Cientifica (PIC), provides support to a worldwide community of scientists, without requiring the end user to know any Structured Query Language (SQL). It is serving data of several large international collaborations such as the Euclid space mission, the Dark Energy Survey (DES), the Physics of the Accelerating Universe Survey (PAUS) and the Marenostrum Institut de Ci`encies de lEspai (MICE) numerical simulations. While originally developed as a PostgreSQL relational database web frontend, this work describes the current version of CosmoHub, built on top of Apache Hive, which facilitates scalable reading, writing and managing huge datasets. As CosmoHubs datasets are seldomly modified, Hive it is a better fit. Over 60 TiB of catalogued information and $50 times 10^9$ astronomical objects can be interactively explored using an integrated visualization tool which includes 1D histogram and 2D heatmap plots. In our current implementation, online exploration of datasets of $10^9$ objects can be done in a timescale of tens of seconds. Users can also download customized subsets of data in standard formats generated in few minutes.
152 - Chenzhou Cui 2011
Although the roles of data centers and computing centers are becoming more and more important, and on-line research is becoming the mainstream for astronomy, individual research based on locally hosted data is still very common. With the increase of personal storage capacity, it is easy to find hundreds to thousands of FITS files in the personal computer of an astrophysicist. Because Flexible Image Transport System (FITS) is a professional data format initiated by astronomers and used mainly in the small community, data management toolkits for FITS files are very few. Astronomers need a powerful tool to help them manage their local astronomical data. Although Virtual Observatory (VO) is a network oriented astronomical research environment, its applications and related technologies provide useful solutions to enhance the management and utilization of astronomical data hosted in an astronomers personal computer. FITSManager is such a tool to provide astronomers an efficient management and utilization of their local data, bringing VO to astronomers in a seamless and transparent way. FITSManager provides fruitful functions for FITS file management, like thumbnail, preview, type dependent icons, header keyword indexing and search, collaborated working with other tools and online services, and so on. The development of the FITSManager is an effort to fill the gap between management and analysis of astronomical data.
PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written Python fu nctions (nodes) connected by pipes (edges) into a directed acyclic graph. These functions are arbitrarily definable, and can make use of any Python modules or external binaries. Given a user-defined topology and collection of input data, functions are composed into nested higher-order maps, which are transparently and robustly evaluated in parallel on a single computer or on remote hosts. Local and remote computational resources can be flexibly pooled and assigned to functional nodes, thereby allowing facile load-balancing and pipeline optimization to maximize computational throughput. Input items are processed by nodes in parallel, and traverse the graph in batches of adjustable size -- a trade-off between lazy-evaluation, parallelism, and memory consumption. The processing of a single item can be parallelized in a scatter/gather scheme. The simplicity and flexibility of distributed workflows using PaPy bridges the gap between desktop -> grid, enabling this new computing paradigm to be leveraged in the processing of large scientific datasets.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا