No Arabic abstract
Future astrophysical surveys such as J-PAS will produce very large datasets, which will require the deployment of accurate and efficient Machine Learning (ML) methods. In this work, we analyze the miniJPAS survey, which observed about 1 deg2 of the AEGIS field with 56 narrow-band filters and 4 ugri broad-band filters. We discuss the classification of miniJPAS sources into extended (galaxies) and point-like (e.g. stars) objects, a necessary step for the subsequent scientific analyses. We aim at developing an ML classifier that is complementary to traditional tools based on explicit modeling. In order to train and test our classifiers, we crossmatched the miniJPAS dataset with SDSS and HSC-SSP data. We trained and tested 6 different ML algorithms on the two crossmatched catalogs. As input for the ML algorithms we use the magnitudes from the 60 filters together with their errors, with and without the morphological parameters. We also use the mean PSF in the r detection band for each pointing. We find that the RF and ERT algorithms perform best in all scenarios. When analyzing the full magnitude range of 15<r<23.5 we find AUC=0.957 with RF when using only photometric information, and AUC=0.986 with ERT when using photometric and morphological information. Regarding feature importance, when using morphological parameters, FWHM is the most important feature. When using photometric information only, we observe that broad bands are not necessarily more important than narrow bands, and errors are as important as the measurements. ML algorithms can compete with traditional star/galaxy classifiers, outperforming the latter at fainter magnitudes (r>21). We use our best classifiers, with and without morphology, in order to produce a value added catalog available at https://j-pas.org/datareleases .
We show that multiple machine learning algorithms can match human performance in classifying transient imaging data from the Sloan Digital Sky Survey (SDSS) supernova survey into real objects and artefacts. This is a first step in any transient science pipeline and is currently still done by humans, but future surveys such as the Large Synoptic Survey Telescope (LSST) will necessitate fully machine-enabled solutions. Using features trained from eigenimage analysis (principal component analysis, PCA) of single-epoch g, r and i-difference images, we can reach a completeness (recall) of 96 per cent, while only incorrectly classifying at most 18 per cent of artefacts as real objects, corresponding to a precision (purity) of 84 per cent. In general, random forests performed best, followed by the k-nearest neighbour and the SkyNet artificial neural net algorithms, compared to other methods such as naive Bayes and kernel support vector machine. Our results show that PCA-based machine learning can match human success levels and can naturally be extended by including multiple epochs of data, transient colours and host galaxy information which should allow for significant further improvements, especially at low signal-to-noise.
We perform a comparison of different approaches to star-galaxy classification using the broad-band photometric data from Year 1 of the Dark Energy Survey. This is done by performing a wide range of tests with and without external `truth information, which can be ported to other similar datasets. We make a broad evaluation of the performance of the classifiers in two science cases with DES data that are most affected by this systematic effect: large-scale structure and Milky Way studies. In general, even though the default morphological classifiers used for DES Y1 cosmology studies are sufficient to maintain a low level of systematic contamination from stellar mis-classification, contamination can be reduced to the O(1%) level by using multi-epoch and infrared information from external datasets. For Milky Way studies the stellar sample can be augmented by ~20% for a given flux limit. Reference catalogs used in this work will be made available upon publication.
In modern astronomy, machine learning as an raising realm for data analysis, has proved to be efficient and effective to mine the big data from the newest telescopes. By using support vector machine (SVM), we construct a supervised machine learning algorithm, to classify the objects in the Javalambre-Photometric Local Universe Survey (J-Plus). The sample is featured with 12-waveband, and magnitudes is labeled with spectrum-based catalogs, including Sloan Digital Sky Survey spectroscopic data, Large Sky Area Multi-Object Fiber Spectroscopic Telescope, and VERONCAT - Veron Catalog of Quasars & AGN. The performance of the classifier is presented with the applications of blind test validations based on RAdial Velocity Extension, Kepler Input Catalog, 2 MASS Redshift Survey, and UV-bright Quasar Survey. The accuracies of the classifier are 96.5% in blind test and 97.0% in training cross validation. The F_1-scores are 95.0% for STAR, 92.9% for GALAXY and 87.0% for QSO. In the classification for J-Plus catalog, we develop a new method to constrain the potential extrapolation.
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques fitting parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieves an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.
We investigate star-galaxy classification for astronomical surveys in the context of four methods enabling the interpretation of black-box machine learning systems. The first is outputting and exploring the decision boundaries as given by decision tree based methods, which enables the visualization of the classification categories. Secondly, we investigate how the Mutual Information based Transductive Feature Selection (MINT) algorithm can be used to perform feature pre-selection. If one would like to provide only a small number of input features to a machine learning classification algorithm, feature pre-selection provides a method to determine which of the many possible input properties should be selected. Third is the use of the tree-interpreter package to enable popular decision tree based ensemble methods to be opened, visualized, and understood. This is done by additional analysis of the tree based model, determining not only which features are important to the model, but how important a feature is for a particular classification given its value. Lastly, we use decision boundaries from the model to revise an already existing method of classification, essentially asking the tree based method where decision boundaries are best placed and defining a new classification method. We showcase these techniques by applying them to the problem of star-galaxy separation using data from the Sloan Digital Sky Survey (hereafter SDSS). We use the output of MINT and the ensemble methods to demonstrate how more complex decision boundaries improve star-galaxy classification accuracy over the standard SDSS frames approach (reducing misclassifications by up to $approx33%$). We then show how tree-interpreter can be used to explore how relevant each photometric feature is when making a classification on an object by object basis.