No Arabic abstract
We present a star/galaxy classification for the Southern Photometric Local Universe Survey (S-PLUS), based on a Machine Learning approach: the Random Forest algorithm. We train the algorithm using the S-PLUS optical photometry up to $r$=21, matched to SDSS/DR13, and morphological parameters. The metric of importance is defined as the relative decrease of the initial accuracy when all correlations related to a certain feature is vanished. In general, the broad photometric bands presented higher importance when compared to narrow ones. The influence of the morphological parameters has been evaluated training the RF with and without the inclusion of morphological parameters, presenting accuracy values of 95.0% and 88.1%, respectively. Particularly, the morphological parameter {rm FWHM/PSF} performed the highest importance over all features to distinguish between stars and galaxies, indicating that it is crucial to classify objects into stars and galaxies. We investigate the misclassification of stars and galaxies in the broad-band colour-colour diagram $(g-r)$ versus $(r-i)$. The morphology can notably improve the classification of objects at regions in the diagram where the misclassification was relatively high. Consequently, it provides cleaner samples for statistical studies. The expected contamination rate of red galaxies as a function of the redshift is estimated, providing corrections for red galaxy samples. The classification of QSOs as extragalactic objects is slightly better using photometric-only case. An extragalactic point-source catalogue is provided using the classification without any morphology feature (only the SED information) with additional constraints on photometric redshifts and {rm FWHM/PSF} values.
The cosmic web plays a major role in the formation and evolution of galaxies and defines, to a large extent, their properties. However, the relation between galaxies and environment is still not well understood. Here we present a machine learning approach to study imprints of environmental effects on the mass assembly of haloes. We present a galaxy-LSS machine learning classifier based on galaxy properties sensitive to the environment. We then use the classifier to assess the relevance of each property. Correlations between galaxy properties and their cosmic environment can be used to predict galaxy membership to void/wall or filament/cluster with an accuracy of $93%$. Our study unveils environmental information encoded in properties of haloes not normally considered directly dependent on the cosmic environment such as merger history and complexity. Understanding the physical mechanism by which the cosmic web is imprinted in a halo can lead to significant improvements in galaxy formation models. This is accomplished by extracting features from galaxy properties and merger trees, computing feature scores for each feature and then applying support vector machine to different feature sets. To this end, we have discovered that the shape and depth of the merger tree, formation time and density of the galaxy are strongly associated with the cosmic environment. We describe a significant improvement in the original classification algorithm by performing LU decomposition of the distance matrix computed by the feature vectors and then using the output of the decomposition as input vectors for support vector machine.
In this work we explore the possibility of applying machine learning methods designed for one-dimensional problems to the task of galaxy image classification. The algorithms used for image classification typically rely on multiple costly steps, such as the Point Spread Function (PSF) deconvolution and the training and application of complex Convolutional Neural Networks (CNN) of thousands or even millions of parameters. In our approach, we extract features from the galaxy images by analysing the elliptical isophotes in their light distribution and collect the information in a sequence. The sequences obtained with this method present definite features allowing a direct distinction between galaxy types, as opposed to smooth Sersic profiles. Then, we train and classify the sequences with machine learning algorithms, designed through the platform Modulos AutoML, and study how they optimize the classification task. As a demonstration of this method, we use the second public release of the Dark Energy Survey (DES DR2). We show that by applying it to this sample we are able to successfully distinguish between early-type and late-type galaxies, for images with signal-to-noise ratio greater then 300. This yields an accuracy of $86%$ for the early-type galaxies and $93%$ for the late-type galaxies, which is on par with most contemporary automated image classification approaches. Our novel method allows for galaxy images to be accurately classified and is faster than other approaches. Data dimensionality reduction also implies a significant lowering in computational cost. In the perspective of future data sets obtained with e.g. Euclid and the Vera Rubin Observatory (VRO), this work represents a path towards using a well-tested and widely used platform from industry in efficiently tackling galaxy classification problems at the peta-byte scale.
The hot intra-cluster medium (ICM) surrounding the heart of galaxy clusters is a complex medium comprised of various emitting components. Although previous studies of nearby galaxy clusters, such as the Perseus, the Coma, or the Virgo cluster, have demonstrated the need for multiple thermal components when spectroscopically fitting the ICMs X-ray emission, no systematic methodology for calculating the number of underlying components currently exists. In turn, underestimating or overestimating the number of components can cause systematic errors in the emission parameter estimations. In this paper, we present a novel approach to determining the number of components using an amalgam of machine learning techniques. Synthetic spectra containing a various number of underlying thermal components were created using well-established tools available from the textit{Chandra} X-ray Observatory. The dimensions of the training set was initially reduced using the Principal Component Analysis and then categorized based on the number of underlying components using a Random Forest Classifier. Our trained and tested algorithm was subsequently applied to textit{Chandra} X-ray observations of the Perseus cluster. Our results demonstrate that machine learning techniques can efficiently and reliably estimate the number of underlying thermal components in the spectra of galaxy clusters, regardless of the thermal model (MEKAL versus APEC). %and signal-to-noise ratio used. We also confirm that the core of the Perseus cluster contains a mix of differing underlying thermal components. We emphasize that although this methodology was trained and applied on textit{Chandra} X-ray observations, it is readily portable to other current (e.g. XMM-Newton, eROSITA) and upcoming (e.g. Athena, Lynx, XRISM) X-ray telescopes. The code is publicly available at url{https://github.com/XtraAstronomy/Pumpkin}.
Future astrophysical surveys such as J-PAS will produce very large datasets, which will require the deployment of accurate and efficient Machine Learning (ML) methods. In this work, we analyze the miniJPAS survey, which observed about 1 deg2 of the AEGIS field with 56 narrow-band filters and 4 ugri broad-band filters. We discuss the classification of miniJPAS sources into extended (galaxies) and point-like (e.g. stars) objects, a necessary step for the subsequent scientific analyses. We aim at developing an ML classifier that is complementary to traditional tools based on explicit modeling. In order to train and test our classifiers, we crossmatched the miniJPAS dataset with SDSS and HSC-SSP data. We trained and tested 6 different ML algorithms on the two crossmatched catalogs. As input for the ML algorithms we use the magnitudes from the 60 filters together with their errors, with and without the morphological parameters. We also use the mean PSF in the r detection band for each pointing. We find that the RF and ERT algorithms perform best in all scenarios. When analyzing the full magnitude range of 15<r<23.5 we find AUC=0.957 with RF when using only photometric information, and AUC=0.986 with ERT when using photometric and morphological information. Regarding feature importance, when using morphological parameters, FWHM is the most important feature. When using photometric information only, we observe that broad bands are not necessarily more important than narrow bands, and errors are as important as the measurements. ML algorithms can compete with traditional star/galaxy classifiers, outperforming the latter at fainter magnitudes (r>21). We use our best classifiers, with and without morphology, in order to produce a value added catalog available at https://j-pas.org/datareleases .
Galaxy morphology is a fundamental quantity, that is essential not only for the full spectrum of galaxy-evolution studies, but also for a plethora of science in observational cosmology. While a rich literature exists on morphological-classification techniques, the unprecedented data volumes, coupled, in some cases, with the short cadences of forthcoming Big-Data surveys (e.g. from the LSST), present novel challenges for this field. Large data volumes make such datasets intractable for visual inspection (even via massively-distributed platforms like Galaxy Zoo), while short cadences make it difficult to employ techniques like supervised machine-learning, since it may be impractical to repeatedly produce training sets on short timescales. Unsupervised machine learning, which does not require training sets, is ideally suited to the morphological analysis of new and forthcoming surveys. Here, we employ an algorithm that performs clustering of graph representations, in order to group image patches with similar visual properties and objects constructed from those patches, like galaxies. We implement the algorithm on the Hyper-Suprime-Cam Subaru-Strategic-Program Ultra-Deep survey, to autonomously reduce the galaxy population to a small number (160) of morphological clusters, populated by galaxies with similar morphologies, which are then benchmarked using visual inspection. The morphological classifications (which we release publicly) exhibit a high level of purity, and reproduce known trends in key galaxy properties as a function of morphological type at z<1 (e.g. stellar-mass functions, rest-frame colours and the position of galaxies on the star-formation main sequence). Our study demonstrates the power of unsupervised machine learning in performing accurate morphological analysis, which will become indispensable in this new era of deep-wide surveys.