No Arabic abstract
We present a new QSO selection algorithm using a Support Vector Machine (SVM), a supervised classification method, on a set of extracted times series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars and microlensing events using 58 known QSOs, 1,629 variable stars and 4,288 non-variables using the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ~80% of known QSOs with a 25% false positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) dataset, which consists of 40 million lightcurves, and found 1,620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxys Evolution (SAGE) LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.
We aim to select quasar candidates based on the two large survey databases, Pan-STARRS and AllWISE. Exploring the distribution of quasars and stars in the color spaces, we find that the combination of infrared and optical photometry is more conducive to select quasar candidates. Two new color criterions (yW1W2 and izW1W2) are constructed to distinguish quasars from stars efficiently. With izW1W2, 98.30% of star contamination is eliminated, while 99.50% of quasars are retained, at least to the magnitude limit of our training set of stars. Based on the optical and infrared color features, we put forward an efficient schema to select quasar candidates and high redshift quasar candidates, in which two machine learning algorithms (XGBoost and SVM) are implemented. The XGBoost and SVM classifiers have proven to be very effective with accuracy of 99.46% when 8Color as input pattern and default model parameters. Applying the two optimal classifiers to the unknown Pan-STARRS and AllWISE cross-matched data set, a total of 2,006,632 intersected sources are predicted to be quasar candidates given quasar probability larger than 0.5 (i.e. P_QSO>0.5). Among them, 1,201,211 have high probability (P_QSO>0.95). For these newly predicted quasar candidates, a regressor is constructed to estimate their redshifts. Finally 7,402 z>3.5 quasars are obtained. Given the magnitude limitation and site of the LAMOST telescope, part of these candidates will be used as the input catalogue of the LAMOST telescope for follow-up observation, and the rest may be observed by other telescopes.
The scientific value of the next generation of large continuum surveys would be greatly increased if the redshifts of the newly detected sources could be rapidly and reliably estimated. Given the observational expense of obtaining spectroscopic redshifts for the large number of new detections expected, there has been substantial recent work on using machine learning techniques to obtain photometric redshifts. Here we compare the accuracy of the predicted photometric redshifts obtained from Deep Learning(DL) with the k-Nearest Neighbour (kNN) and the Decision Tree Regression (DTR) algorithms. We find using a combination of near-infrared, visible and ultraviolet magnitudes, trained upon a sample of SDSS QSOs, that the kNN and DL algorithms produce the best self-validation result with a standard deviation of {sigma} = 0.24. Testing on various sub-samples, we find that the DL algorithm generally has lower values of {sigma}, in addition to exhibiting a better performance in other measures. Our DL method, which uses an easy to implement off-the-shelf algorithm with no filtering nor removal of outliers, performs similarly to other, more complex, algorithms, resulting in an accuracy of {Delta}z < 0.1$ up to z ~ 2.5. Applying the DL algorithm trained on our 70,000 strong sample to other independent (radio-selected) datasets, we find {sigma} < 0.36 over a wide range of radio flux densities. This indicates much potential in using this method to determine photometric redshifts of quasars detected with the Square Kilometre Array.
We explored the AllWISE catalogue of the Wide-field Infrared Survey Explorer mission and identified Young Stellar Object candidates. Reliable 2MASS and WISE photometric data combined with Planck dust opacity values were used to build our dataset and to find the best classification scheme. A sophisticated statistical method, the Support Vector Machine (SVM) is used to analyse the multi-dimensional data space and to remove source types identified as contaminants (extragalactic sources, main sequence stars, evolved stars and sources related to the interstellar medium). Objects listed in the SIMBAD database are used to identify the already known sources and to train our method. A new all-sky selection of 133,980 Class I/II YSO candidates is presented. The estimated contamination was found to be well below 1% based on comparison with our SIMBAD training set. We also compare our results to that of existing methods and catalogues. The SVM selection process successfully identified >90% of the Class I/II YSOs based on comparison with photometric and spectroscopic YSO catalogues. Our conclusion is that by using the SVM, our classification is able to identify more known YSOs of the training sample than other methods based on colour-colour and magnitude-colour selection. The distribution of the YSO candidates well correlates with that of the Planck Galactic Cold Clumps in the Taurus--Auriga--Perseus--California region.
The quasar target selection for the upcoming survey of the Dark Energy Spectroscopic Instrument (DESI) will be fixed for the next five years. The aim of this work is to validate the quasar selection by studying the impact of imaging systematics as well as stellar and galactic contaminants, and to develop a procedure to mitigate them. Density fluctuations of quasar targets are found to be related to photometric properties such as seeing and depth of the Data Release 9 of the DESI Legacy Imaging Surveys. To model this complex relation, we explore machine learning algorithms (Random Forest and Multi-Layer Perceptron) as an alternative to the standard linear regression. Splitting the footprint of the Legacy Imaging Surveys into three regions according to photometric properties, we perform an independent analysis in each region, validating our method using eBOSS EZ-mocks. The mitigation procedure is tested by comparing the angular correlation of the corrected target selection on each photometric region to the angular correlation function obtained using quasars from the Sloan Digital Sky Survey (SDSS)Data Release 16. With our procedure, we recover a similar level of correlation between DESI quasar targets and SDSS quasars in two thirds of the total footprint and we show that the excess of correlation in the remaining area is due to a stellar contamination which should be removed with DESI spectroscopic data. We derive the Limber parameters in our three imaging regions and compare them to previous measurements from SDSS and the 2dF QSO Redshift Survey.
The DESI survey will measure large-scale structure using quasars as direct tracers of dark matter in the redshift range $0.9<z<2.1$ and using quasar Ly-$alpha$ forests at $z>2.1$. We present two methods to select candidate quasars for DESI based on imaging in three optical ($g, r, z$) and two infrared ($W1, W2$) bands. The first method uses traditional color cuts and the second utilizes a machine-learning algorithm.