No Arabic abstract
Gaia is an ambitious space astrometry mission of ESA with a main objective to map the sky in astrometry and photometry down to a magnitude 20 by the end of the next decade. While the mission is built and operated by ESA and an industrial consortium, the data processing is entrusted to a consortium formed by the scientific community, which was formed in 2006 and formally selected by ESA one year later. The satellite will downlink around 100 TB of raw telemetry data over a mission duration of 5 years from which a very complex iterative processing will lead to the final science output: astrometry with a final accuracy of a few tens of microarcseconds, epoch photometry in wide and narrow bands, radial velocity and spectra for the stars brighter than 17 mag. We discuss the general principles and main difficulties of this very large data processing and present the organisation of the European Consortium responsible for its design and implementation.
Euclid is a Europe-led cosmology space mission dedicated to a visible and near infrared survey of the entire extra-galactic sky. Its purpose is to deepen our knowledge of the dark content of our Universe. After an overview of the Euclid mission and science, this contribution describes how the community is getting organized to face the data analysis challenges, both in software development and in operational data processing matters. It ends with a more specific account of some of the main contributions of the Swiss Science Data Center (SDC-CH).
Gaia is ESAs ambitious space astrometry mission the main objective of which is to astrometrically and spectro-photometrically map 1000 Million celestial objects (mostly in our galaxy) with unprecedented accuracy. The announcement of opportunity for the data processing will be issued by ESA late in 2006. The Gaia Data Processing and Analysis Consortium (DPAC) has been formed recently and is preparing an answer. The satellite will downlink close to 100 TB of raw telemetry data over 5 years. To achieve its required accuracy of a few 10s of Microarcsecond astrometry, a highly involved processing of this data is required. In addition to the main astrometric instrument Gaia will host a Radial Velocity instrument, two low-resolution dispersers for multi-color photometry and two Star Mappers. Gaia is a flying Giga Pixel camera. The various instruments each require relatively complex processing while at the same time being interdependent. We describe the overall composition of the DPAC and the envisaged overall architecture of the Gaia data processing system. We shall delve further into the core processing - one of the nine, so-called, coordination units comprising the Gaia processing system.
There is growing interest in the use of Knowledge Graphs (KGs) for the representation, exchange, and reuse of scientific data. While KGs offer the prospect of improving the infrastructure for working with scalable and reusable scholarly data consistent with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles, the state-of-the-art Data Management Systems (DMSs) for processing large KGs leave somewhat to be desired. In this paper, we studied the performance of some of the major DMSs in the context of querying KGs with the goal of providing a finely-grained, comparative analysis of DMSs representing each of the four major DMS types. We experimented with four well-known scientific KGs, namely, Allie, Cellcycle, DrugBank, and LinkedSPL against Virtuoso, Blazegraph, RDF-3X, and MongoDB as the representative DMSs. Our results suggest that the DMSs display limitations in processing complex queries on the KG datasets. Depending on the query type, the performance differentials can be several orders of magnitude. Also, no single DMS appears to offer consistently superior performance. We present an analysis of the underlying issues and outline two integrated approaches and proposals for resolving the problem.
The Gaia Data Release 2 contains the 1st release of radial velocities complementing the kinematic data of a sample of about 7 million relatively bright, late-type stars. Aims: This paper provides a detailed description of the Gaia spectroscopic data processing pipeline, and of the approach adopted to derive the radial velocities presented in DR2. Methods: The pipeline must perform four main tasks: (i) clean and reduce the spectra observed with the Radial Velocity Spectrometer (RVS); (ii) calibrate the RVS instrument, including wavelength, straylight, line-spread function, bias non-uniformity, and photometric zeropoint; (iii) extract the radial velocities; and (iv) verify the accuracy and precision of the results. The radial velocity of a star is obtained through a fit of the RVS spectrum relative to an appropriate synthetic template spectrum. An additional task of the spectroscopic pipeline was to provide 1st-order estimates of the stellar atmospheric parameters required to select such template spectra. We describe the pipeline features and present the detailed calibration algorithms and software solutions we used to produce the radial velocities published in DR2. Results: The spectroscopic processing pipeline produced median radial velocities for Gaia stars with narrow-band near-IR magnitude Grvs < 12 (i.e. brighter than V~13). Stars identified as double-lined spectroscopic binaries were removed from the pipeline, while variable stars, single-lined, and non-detected double-lined spectroscopic binaries were treated as single stars. The scatter in radial velocity among different observations of a same star, also published in DR2, provides information about radial velocity variability. For the hottest (Teff > 7000 K) and coolest (Teff < 3500 K) stars, the accuracy and precision of the stellar parameter estimates are not sufficient to allow selection of appropriate templates. [Abridged]
The second Gaia data release is based on 22 months of mission data with an average of 0.9 billion individual CCD observations per day. A data volume of this size and granularity requires a robust and reliable but still flexible system to achieve the demanding accuracy and precision constraints that Gaia is capable of delivering. The internal Gaia photometric system was initialised using an iterative process that is solely based on Gaia data. A set of calibrations was derived for the entire Gaia DR2 baseline and then used to produce the final mean source photometry. The photometric catalogue contains 2.5 billion sources comprised of three different grades depending on the availability of colour information and the procedure used to calibrate them: 1.5 billion gold, 144 million silver, and 0.9 billion bronze. These figures reflect the results of the photometric processing; the content of the data release will be different due to the validation and data quality filters applied during the catalogue preparation. The photometric processing pipeline, PhotPipe, implements all the processing and calibration workflows in terms of Map/Reduce jobs based on the Hadoop platform. This is the first example of a processing system for a large astrophysical survey project to make use of these technologies. The improvements in the generation of the integrated G-band fluxes, in the attitude modelling, in the cross-matching, and and in the identification of spurious detections led to a much cleaner input stream for the photometric processing. This, combined with the improvements in the definition of the internal photometric system and calibration flow, produced high-quality photometry. Hadoop proved to be an excellent platform choice for the implementation of PhotPipe in terms of overall performance, scalability, downtime, and manpower required for operations and maintenance.