No Arabic abstract
The number of known, bright ($i<18$), high-redshift ($z>2.5$) QSOs in the Southern Hemisphere is considerably lower than the corresponding number in the Northern Hemisphere due to the lack of multi-wavelength surveys at $delta<0$. Recent works, such as the QUBRICS survey, successfully identified new, high-redshift QSOs in the South by means of a machine learning approach applied on a large photometric dataset. Building on the success of QUBRICS, we present a new QSO selection method based on the Probabilistic Random Forest (PRF), an improvement of the classic Random Forest algorithm. The PRF takes into account measurement errors, treating input data as probability distribution functions: this allows us to obtain better accuracy and a robust predictive model. We applied the PRF to the same photometric dataset used in QUBRICS, based on the SkyMapper DR1, Gaia DR2, 2MASS, WISE and GALEX databases. The resulting candidate list includes $626$ sources with $i<18$. We estimate for our proposed algorithm a completeness of $sim84%$ and a purity of $sim78%$ on the test datasets. Preliminary spectroscopic campaigns allowed us to observe 41 candidates, of which 29 turned out to be $z>2.5$ QSOs. The performances of the PRF, currently comparable to those of the CCA, are expected to improve as the number of high-z QSOs available for the training sample grows: results are however already promising, despite this being one of the first applications of this method to an astrophysical context.
We present the results of the spectroscopic follow up of the QUBRICS survey. The selection method is based on a machine learning approach applied to photometric catalogs, covering an area of $sim$ 12,400 deg$^2$ in the Southern Hemisphere. The spectroscopic observations started in 2018 and identified 55 new, high-redshift (z>=2.5), bright (i<=18) QSOs, with the catalog published in late 2019. Here we report the current status of the survey, bringing the total number of bright QSOs at z<=2.5 identified by QUBRICS to 224. The success rate of the QUBRICS selection method, in its most recent training, is estimated to be 68%. The predominant contaminant turns out to be lower-z QSOs at z<2.5. This survey provides a unique sample of bright QSOs at high-z available for a number of cosmological investigations. In particular, carrying out the redshift drift measurements (Sandage Test) in the Southern Hemisphere, using the HIRES spectrograph at the 39m ELT, appears to be possible with less than 2500 hours of observations spread over 30 targets in 25 years.
Near-infrared high-angular resolution imaging observations of the Milky Ways nuclear star cluster have revealed all luminous members of the existing stellar population within the central parsec. Generally, these stars are either evolved late-type giants or massive young, early-type stars. We revisit the problem of stellar classification based on intermediate-band photometry in the K-band, with the primary aim of identifying faint early-type candidate stars in the extended vicinity of the central massive black hole. A random forest classifier, trained on a subsample of spectroscopically identified stars, performs similarly well as competitive methods (F1=0.85), without involving any model of stellar spectral energy distributions. Advantages of using such a machine-trained classifier are a minimum of required calibration effort, a predictive accuracy expected to improve as more training data becomes available, and the ease of application to future, larger data sets. By applying this classifier to archive data, we are also able to reproduce the results of previous studies of the spatial distribution and the K-band luminosity function of both the early- and late-type stars.
Being observed only one billion years after the Big Bang, z ~ 7 quasars are a unique opportunity for exploring the early Universe. However, only two z ~ 7 quasars have been discovered in near-infrared surveys: the quasars ULAS J1120+0641 and ULAS J1342+0928 at z = 7.09 and z = 7.54, respectively. The Canada-France High-z Quasar Survey in the Near Infrared (CFHQSIR) has been carried out to search for z ~ 7 quasars using near-infrared and optical imaging from the Canada-France Hawaii Telescope (CFHT). Our data consist of $rm{sim 130,deg^{2}}$ of Wide-field Infrared Camera (WIRCam) Y-band images up to a 5{sigma} limit of $rm{Y_{AB}}$ ~ 22.4 distributed over the Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) Wide fields. After follow-up observations in J band, a first photometric selection based on simple colour criteria led us to identify 36 sources with measured high-redshift quasar colours. However, we expect to detect only ~ 2 quasars in the redshift range 6.8 < z < 7.5 down to a rest-frame absolute magnitude of $rm{M_{1450}}$ = -24.6. With the motivation of ranking our high-redshift quasar candidates in the best possible way, we developed an advanced classification method based on Bayesian formalism in which we model the high-redshift quasars and low-mass star populations. The model includes the colour diversity of the two populations and the variation in space density of the low-mass stars with Galactic latitude, and it is combined with our observational data. For each candidate, we compute the probability of being a high-redshift quasar rather than a low-mass star. This results in a refined list of the most promising candidates. Our Bayesian selection procedure has proven to be a powerful technique for identifying the best candidates of any photometrically selected sample of objects, and it is easily extendable to other surveys.
The survey of the COSMOS field by the VLT Survey Telescope is an appealing testing ground for variability studies of active galactic nuclei (AGN). With 54 r-band visits over 3.3 yr and a single-visit depth of 24.6 r-band mag, the dataset is also particularly interesting in the context of performance forecasting for the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). This work is the fifth in a series dedicated to the development of an automated, robust, and efficient methodology to identify optically variable AGN, aimed at deploying it on future LSST data. We test the performance of a random forest (RF) algorithm in selecting optically variable AGN candidates, investigating how the use of different AGN labeled sets (LSs) and features sets affects this performance. We define a heterogeneous AGN LS and choose a set of variability features and optical and near-infrared colors based on what can be extracted from LSST data. We find that an AGN LS that includes only Type I sources allows for the selection of a highly pure (91%) sample of AGN candidates, obtaining a completeness with respect to spectroscopically confirmed AGN of 69% (vs. 59% in our previous work). The addition of colors to variability features mildly improves the performance of the RF classifier, while colors alone prove less effective than variability in selecting AGN as they return contaminated samples of candidates and fail to identify most host-dominated AGN. We observe that a bright (r < 21 mag) AGN LS is able to retrieve candidate samples not affected by the magnitude cut, which is of great importance as faint AGN LSs for LSST-related studies will be hard to find and likely imbalanced. We estimate a sky density of 6.2 million AGN for the LSST main survey down to our current magnitude limit.
We aim to select quasar candidates based on the two large survey databases, Pan-STARRS and AllWISE. Exploring the distribution of quasars and stars in the color spaces, we find that the combination of infrared and optical photometry is more conducive to select quasar candidates. Two new color criterions (yW1W2 and izW1W2) are constructed to distinguish quasars from stars efficiently. With izW1W2, 98.30% of star contamination is eliminated, while 99.50% of quasars are retained, at least to the magnitude limit of our training set of stars. Based on the optical and infrared color features, we put forward an efficient schema to select quasar candidates and high redshift quasar candidates, in which two machine learning algorithms (XGBoost and SVM) are implemented. The XGBoost and SVM classifiers have proven to be very effective with accuracy of 99.46% when 8Color as input pattern and default model parameters. Applying the two optimal classifiers to the unknown Pan-STARRS and AllWISE cross-matched data set, a total of 2,006,632 intersected sources are predicted to be quasar candidates given quasar probability larger than 0.5 (i.e. P_QSO>0.5). Among them, 1,201,211 have high probability (P_QSO>0.95). For these newly predicted quasar candidates, a regressor is constructed to estimate their redshifts. Finally 7,402 z>3.5 quasars are obtained. Given the magnitude limitation and site of the LAMOST telescope, part of these candidates will be used as the input catalogue of the LAMOST telescope for follow-up observation, and the rest may be observed by other telescopes.