No Arabic abstract
We explored the AllWISE catalogue of the Wide-field Infrared Survey Explorer mission and identified Young Stellar Object candidates. Reliable 2MASS and WISE photometric data combined with Planck dust opacity values were used to build our dataset and to find the best classification scheme. A sophisticated statistical method, the Support Vector Machine (SVM) is used to analyse the multi-dimensional data space and to remove source types identified as contaminants (extragalactic sources, main sequence stars, evolved stars and sources related to the interstellar medium). Objects listed in the SIMBAD database are used to identify the already known sources and to train our method. A new all-sky selection of 133,980 Class I/II YSO candidates is presented. The estimated contamination was found to be well below 1% based on comparison with our SIMBAD training set. We also compare our results to that of existing methods and catalogues. The SVM selection process successfully identified >90% of the Class I/II YSOs based on comparison with photometric and spectroscopic YSO catalogues. Our conclusion is that by using the SVM, our classification is able to identify more known YSOs of the training sample than other methods based on colour-colour and magnitude-colour selection. The distribution of the YSO candidates well correlates with that of the Planck Galactic Cold Clumps in the Taurus--Auriga--Perseus--California region.
The imminent advent of very large-scale optical sky surveys, such as Euclid and LSST, makes it important to find efficient ways of discovering rare objects such as strong gravitational lens systems, where a background object is multiply gravitationally imaged by a foreground mass. As well as finding the lens systems, it is important to reject false positives due to intrinsic structure in galaxies, and much work is in progress with machine learning algorithms such as neural networks in order to achieve both these aims. We present and discuss a Support Vector Machine (SVM) algorithm which makes use of a Gabor filterbank in order to provide learning criteria for separation of lenses and non-lenses, and demonstrate using blind challenges that under certain circumstances it is a particularly efficient algorithm for rejecting false positives. We compare the SVM engine with a large-scale human examination of 100000 simulated lenses in a challenge dataset, and also apply the SVM method to survey images from the Kilo-Degree Survey.
SPIDERS (SPectroscopic IDentification of eROSITA Sources) is an SDSS-IV survey running in parallel to the eBOSS cosmology project. SPIDERS will obtain optical spectroscopy for large numbers of X-ray-selected AGN and galaxy cluster members detected in wide area eROSITA, XMM-Newton and ROSAT surveys. We describe the methods used to choose spectroscopic targets for two sub-programmes of SPIDERS: X-ray selected AGN candidates detected in the ROSAT All Sky and the XMM-Newton Slew surveys. We have exploited a Bayesian cross-matching algorithm, guided by priors based on mid-IR colour-magnitude information from the WISE survey, to select the most probable optical counterpart to each X-ray detection. We empirically demonstrate the high fidelity of our counterpart selection method using a reference sample of bright well-localised X-ray sources collated from XMM-Newton, Chandra and Swift-XRT serendipitous catalogues, and also by examining blank-sky locations. We describe the down-selection steps which resulted in the final set of SPIDERS-AGN targets put forward for spectroscopy within the eBOSS/TDSS/SPIDERS survey, and present catalogues of these targets. We also present catalogues of ~12000 ROSAT and ~1500 XMM-Newton Slew survey sources which have existing optical spectroscopy from SDSS-DR12, including the results of our visual inspections. On completion of the SPIDERS program, we expect to have collected homogeneous spectroscopic redshift information over a footprint of ~7500 deg$^2$ for >85 percent of the ROSAT and XMM-Newton Slew survey sources having optical counterparts in the magnitude range 17<r<22.5, producing a large and highly complete sample of bright X-ray-selected AGN suitable for statistical studies of AGN evolution and clustering.
We present a new QSO selection algorithm using a Support Vector Machine (SVM), a supervised classification method, on a set of extracted times series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars and microlensing events using 58 known QSOs, 1,629 variable stars and 4,288 non-variables using the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ~80% of known QSOs with a 25% false positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) dataset, which consists of 40 million lightcurves, and found 1,620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxys Evolution (SAGE) LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.
We present criteria for the photometric selection of M-dwarfs using all-sky photometry, with a view to identifying M-dwarf candidates for inclusion in the input catalogues of upcoming all-sky surveys, including TESS and FunnelWeb. The criteria are based on Gaia, WISE and 2MASS all-sky photometry, and deliberately do not rely on astrometric information. In the lead-up to the availability of truly distance-limited samples following the release of Gaia DR2, this approach has the significant benefit of delivering a sample unbiased with regard to space velocity. Our criteria were developed by using Galaxia synthetic galaxy model predictions to evaluate both M-dwarf completeness and false-positive detections (i.e. non-M-dwarf contamination rates). In addition to the previously known sensitivity of J-H colour for giant-dwarf discrimination at cool temperatures, we find the WISE W1-W2 colour is also effective at discriminating M-dwarfs from cool giants. We have derived two sets of Gaia G > 14.5 criteria - a high-completeness set that contains 78,340 stars, of which 30.7-44.4% are expected to be M-dwarfs and contains 99.3% of the total number of expected M-dwarfs; and a low-contamination set that prioritises the stars most likely to be M-dwarfs at a cost of a reduction in completeness. This subset contains 40,505 stars and is expected to be comprised of 58.7-64.1% M-dwarfs, with a completeness of 98%. Comparison of the high-completeness set with the TESS Input Catalogue has identified 234 stars not currently in that catalogue, which preliminary analysis suggests could be useful M-dwarf targets for TESS. We also compared the criteria to selection via absolute magnitude and a combination of both methods. We found that colour selection in combination with an absolute magnitude limit provides the most effective way of selecting M-dwarfs en masse.
We aim to select quasar candidates based on the two large survey databases, Pan-STARRS and AllWISE. Exploring the distribution of quasars and stars in the color spaces, we find that the combination of infrared and optical photometry is more conducive to select quasar candidates. Two new color criterions (yW1W2 and izW1W2) are constructed to distinguish quasars from stars efficiently. With izW1W2, 98.30% of star contamination is eliminated, while 99.50% of quasars are retained, at least to the magnitude limit of our training set of stars. Based on the optical and infrared color features, we put forward an efficient schema to select quasar candidates and high redshift quasar candidates, in which two machine learning algorithms (XGBoost and SVM) are implemented. The XGBoost and SVM classifiers have proven to be very effective with accuracy of 99.46% when 8Color as input pattern and default model parameters. Applying the two optimal classifiers to the unknown Pan-STARRS and AllWISE cross-matched data set, a total of 2,006,632 intersected sources are predicted to be quasar candidates given quasar probability larger than 0.5 (i.e. P_QSO>0.5). Among them, 1,201,211 have high probability (P_QSO>0.95). For these newly predicted quasar candidates, a regressor is constructed to estimate their redshifts. Finally 7,402 z>3.5 quasars are obtained. Given the magnitude limitation and site of the LAMOST telescope, part of these candidates will be used as the input catalogue of the LAMOST telescope for follow-up observation, and the rest may be observed by other telescopes.