The Probabilistic Random Forest applied to the selection of quasar candidates in the QUBRICS Survey


Abstract in English

The number of known, bright ($i<18$), high-redshift ($z>2.5$) QSOs in the Southern Hemisphere is considerably lower than the corresponding number in the Northern Hemisphere due to the lack of multi-wavelength surveys at $delta<0$. Recent works, such as the QUBRICS survey, successfully identified new, high-redshift QSOs in the South by means of a machine learning approach applied on a large photometric dataset. Building on the success of QUBRICS, we present a new QSO selection method based on the Probabilistic Random Forest (PRF), an improvement of the classic Random Forest algorithm. The PRF takes into account measurement errors, treating input data as probability distribution functions: this allows us to obtain better accuracy and a robust predictive model. We applied the PRF to the same photometric dataset used in QUBRICS, based on the SkyMapper DR1, Gaia DR2, 2MASS, WISE and GALEX databases. The resulting candidate list includes $626$ sources with $i<18$. We estimate for our proposed algorithm a completeness of $sim84%$ and a purity of $sim78%$ on the test datasets. Preliminary spectroscopic campaigns allowed us to observe 41 candidates, of which 29 turned out to be $z>2.5$ QSOs. The performances of the PRF, currently comparable to those of the CCA, are expected to improve as the number of high-z QSOs available for the training sample grows: results are however already promising, despite this being one of the first applications of this method to an astrophysical context.

Download