No Arabic abstract
We construct a supervised classifier based on Gaussian Mixture Models to probabilistically classify objects in Gaia data release 2 (GDR2) using only photometric and astrometric data in that release. The model is trained empirically to classify objects into three classes -- star, quasar, galaxy -- for G<=14.5 mag down to the Gaia magnitude limit of G=21.0 mag. Galaxies and quasars are identified for the training set by a cross-match to objects with spectroscopic classifications from the Sloan Digital Sky Survey. Stars are defined directly from GDR2. When allowing for the expectation that quasars are 500 times rarer than stars, and galaxies 7500 times rarer than stars (the class imbalance problem), samples classified with a threshold probability of 0.5 are predicted to have purities of 0.43 for quasars and 0.28 for galaxies, and completenesses of 0.58 and 0.72 respectively. The purities can be increased up to 0.60 by adopting a higher threshold. Not accounting for this expected low frequency of extragalactic objects (the class prior) would give both erroneously optimistic performance predictions and severely impure samples. Applying our model to all 1.20 billion objects in GDR2 with the required features, we classify 2.3 million objects as quasars and 0.37 million objects as galaxies (with individual probabilities above 0.5). The small number of galaxies is due to the strong bias of the satellite detection algorithm and on-ground data selection against extended objects. We infer the true number of quasars and galaxies -- as these classes are defined by our training set -- to be 690,000 and 110,000 respectively (+/- 50%). The aim of this work is to see how well extragalactic objects can be classified using only GDR2 data. Better classifications should be possible with the low resolution spectroscopy (BP/RP) planned for GDR3.
The second Gaia data release (DR2), contains very precise astrometric and photometric properties for more than one billion sources, astrophysical parameters for dozens of millions, radial velocities for millions, variability information for half a million of stellar sources and orbits for thousands of solar system objects. Before the Catalogue publication, these data have undergone dedicated validation processes. The goal of this paper is to describe the validation results in terms of completeness, accuracy and precision of the various Gaia DR2 data. The validation processes include a systematic analysis of the Catalogue content to detect anomalies, either individual errors or statistical properties, using statistical analysis, and comparisons to external data or to models. Although the astrometric, photometric and spectroscopic data are of unprecedented quality and quantity, it is shown that the data cannot be used without a dedicated attention to the limitations described here, in the Catalogue documentation and in accompanying papers. A particular emphasis is put on the caveats for the statistical use of the data in scientific exploitation.
The Gaia Data Release 2 contains the 1st release of radial velocities complementing the kinematic data of a sample of about 7 million relatively bright, late-type stars. Aims: This paper provides a detailed description of the Gaia spectroscopic data processing pipeline, and of the approach adopted to derive the radial velocities presented in DR2. Methods: The pipeline must perform four main tasks: (i) clean and reduce the spectra observed with the Radial Velocity Spectrometer (RVS); (ii) calibrate the RVS instrument, including wavelength, straylight, line-spread function, bias non-uniformity, and photometric zeropoint; (iii) extract the radial velocities; and (iv) verify the accuracy and precision of the results. The radial velocity of a star is obtained through a fit of the RVS spectrum relative to an appropriate synthetic template spectrum. An additional task of the spectroscopic pipeline was to provide 1st-order estimates of the stellar atmospheric parameters required to select such template spectra. We describe the pipeline features and present the detailed calibration algorithms and software solutions we used to produce the radial velocities published in DR2. Results: The spectroscopic processing pipeline produced median radial velocities for Gaia stars with narrow-band near-IR magnitude Grvs < 12 (i.e. brighter than V~13). Stars identified as double-lined spectroscopic binaries were removed from the pipeline, while variable stars, single-lined, and non-detected double-lined spectroscopic binaries were treated as single stars. The scatter in radial velocity among different observations of a same star, also published in DR2, provides information about radial velocity variability. For the hottest (Teff > 7000 K) and coolest (Teff < 3500 K) stars, the accuracy and precision of the stellar parameter estimates are not sufficient to allow selection of appropriate templates. [Abridged]
The second release of Gaia data (Gaia DR2) contains the astrometric parameters for more than half a million quasars. This set defines a kinematically non-rotating reference frame in the optical domain referred to as the Gaia-CRF2. The Gaia-CRF2 is the first realisation of a non-rotating global optical reference frame that meets the ICRS prescriptions, meaning that it is built only on extragalactic sources. It consists of the positions of a sample of 556 869 sources in Gaia DR2, obtained from a positional cross-match with the ICRF3-prototype and AllWISE AGN catalogues. The sample constitutes a clean, dense, and homogeneous set of extragalactic point sources in the magnitude range G from 16 to 21 mag with accurately known optical positions. The median positional uncertainty is 0.12 mas for G < 18 mag and 0.5 mas at G = 20 mag. Large-scale systematics are estimated to be in the range 20 to 30 muas. The accuracy claims are supported by the parallaxes and proper motions of the quasars in Gaia DR2. The optical positions for a subset of 2820 sources in common with the ICRF3-prototype show very good overall agreement with the radio positions, but several tens of sources have significantly discrepant positions.
More than half a million of the 1.69 billion sources in Gaia Data Release 2 (DR2) are published with photometric time series that exhibit light variations during the 22 months of observation. An all-sky classification of common high-amplitude pulsators (Cepheids, long-period variables, Delta Scuti / SX Phoenicis, and RR Lyrae stars) is provided for stars with brightness variations greater than 0.1 mag in G band. A semi-supervised classification approach was employed, firstly training multi-stage random forest classifiers with sources of known types in the literature, followed by a preliminary classification of the Gaia data and a second training phase that included a selection of the first classification results to improve the representation of some classes, before the improved classifiers were applied to the Gaia data. Dedicated validation classifiers were used to reduce the level of contamination in the published results. A relevant fraction of objects were not yet sufficiently sampled for reliable Fourier series decomposition, consequently classifiers were based on features derived from statistics of photometric time series in the G, BP, and RP bands, as well as from some astrometric parameters. The published classification results include 195,780 RR Lyrae stars, 150,757 long-period variables, 8550 Cepheids, and 8882 Delta Scuti / SX Phoenicis stars. All of these results represent candidates whose completeness and contamination are described as a function of variability type and classification reliability. Results are expressed in terms of class labels and classification scores, which are available in the vari_classifier_result table of the Gaia archive.
We present the second Gaia data release, Gaia DR2, consisting of astrometry, photometry, radial velocities, and information on astrophysical parameters and variability, for sources brighter than magnitude 21. In addition epoch astrometry and photometry are provided for a modest sample of minor planets in the solar system. A summary of the contents of Gaia DR2 is presented, accompanied by a discussion on the differences with respect to Gaia DR1 and an overview of the main limitations which are still present in the survey. Recommendations are made on the responsible use of Gaia DR2 results. Gaia DR2 contains celestial positions and the apparent brightness in G for approximately 1.7 billion sources. For 1.3 billion of those sources, parallaxes and proper motions are in addition available. The sample of sources for which variability information is provided is expanded to 0.5 million stars. This data release contains four new elements: broad-band colour information in the form of the apparent brightness in the $G_mathrm{BP}$ (330--680 nm) and $G_mathrm{RP}$ (630--1050 nm) bands is available for 1.4 billion sources; median radial velocities for some 7 million sources are presented; for between 77 and 161 million sources estimates are provided of the stellar effective temperature, extinction, reddening, and radius and luminosity; and for a pre-selected list of 14000 minor planets in the solar system epoch astrometry and photometry are presented. Finally, Gaia DR2 also represents a new materialisation of the celestial reference frame in the optical, the Gaia-CRF2, which is the first optical reference frame based solely on extragalactic sources. There are notable changes in the photometric system and the catalogue source list with respect to Gaia DR1, and we stress the need to consider the two data releases as independent.