No Arabic abstract
Galaxy morphology is a fundamental quantity, that is essential not only for the full spectrum of galaxy-evolution studies, but also for a plethora of science in observational cosmology. While a rich literature exists on morphological-classification techniques, the unprecedented data volumes, coupled, in some cases, with the short cadences of forthcoming Big-Data surveys (e.g. from the LSST), present novel challenges for this field. Large data volumes make such datasets intractable for visual inspection (even via massively-distributed platforms like Galaxy Zoo), while short cadences make it difficult to employ techniques like supervised machine-learning, since it may be impractical to repeatedly produce training sets on short timescales. Unsupervised machine learning, which does not require training sets, is ideally suited to the morphological analysis of new and forthcoming surveys. Here, we employ an algorithm that performs clustering of graph representations, in order to group image patches with similar visual properties and objects constructed from those patches, like galaxies. We implement the algorithm on the Hyper-Suprime-Cam Subaru-Strategic-Program Ultra-Deep survey, to autonomously reduce the galaxy population to a small number (160) of morphological clusters, populated by galaxies with similar morphologies, which are then benchmarked using visual inspection. The morphological classifications (which we release publicly) exhibit a high level of purity, and reproduce known trends in key galaxy properties as a function of morphological type at z<1 (e.g. stellar-mass functions, rest-frame colours and the position of galaxies on the star-formation main sequence). Our study demonstrates the power of unsupervised machine learning in performing accurate morphological analysis, which will become indispensable in this new era of deep-wide surveys.
There are several supervised machine learning methods used for the application of automated morphological classification of galaxies; however, there has not yet been a clear comparison of these different methods using imaging data, or a investigation for maximising their effectiveness. We carry out a comparison between several common machine learning methods for galaxy classification (Convolutional Neural Network (CNN), K-nearest neighbour, Logistic Regression, Support Vector Machine, Random Forest, and Neural Networks) by using Dark Energy Survey (DES) data combined with visual classifications from the Galaxy Zoo 1 project (GZ1). Our goal is to determine the optimal machine learning methods when using imaging data for galaxy classification. We show that CNN is the most successful method of these ten methods in our study. Using a sample of $sim$2,800 galaxies with visual classification from GZ1, we reach an accuracy of $sim$0.99 for the morphological classification of Ellipticals and Spirals. The further investigation of the galaxies that have a different ML and visual classification but with high predicted probabilities in our CNN usually reveals an the incorrect classification provided by GZ1. We further find the galaxies having a low probability of being either spirals or ellipticals are visually Lenticulars (S0), demonstrating that supervised learning is able to rediscover that this class of galaxy is distinct from both Es and Spirals. We confirm that $sim$2.5% galaxies are misclassified by GZ1 in our study. After correcting these galaxies labels, we improve our CNN performance to an average accuracy of over 0.99 (accuracy of 0.994 is our best result).
One of the principal bottlenecks to atmosphere characterisation in the era of all-sky surveys is the availability of fast, autonomous and robust atmospheric retrieval methods. We present a new approach using unsupervised machine learning to generate informed priors for retrieval of exoplanetary atmosphere parameters from transmission spectra. We use principal component analysis (PCA) to efficiently compress the information content of a library of transmission spectra forward models generated using the PLATON package. We then apply a $k$-means clustering algorithm in PCA space to segregate the library into discrete classes. We show that our classifier is almost always able to instantaneously place a previously unseen spectrum into the correct class, for low-to-moderate spectral resolutions, $R$, in the range $R~=~30-300$ and noise levels up to $10$~per~cent of the peak-to-trough spectrum amplitude. The distribution of physical parameters for all members of the class therefore provides an informed prior for standard retrieval methods such as nested sampling. We benchmark our informed-prior approach against a standard uniform-prior nested sampler, finding that our approach is up to a factor two faster, with negligible reduction in accuracy. We demonstrate the application of this method to existing and near-future observatories, and show that it is suitable for real-world application. Our general approach is not specific to transmission spectroscopy and should be more widely applicable to cases that involve repetitive fitting of trusted high-dimensional models to large data catalogues, including beyond exoplanetary science.
Classifying the morphologies of galaxies is an important step in understanding their physical properties and evolutionary histories. The advent of large-scale surveys has hastened the need to develop techniques for automated morphological classification. We train and test several convolutional neural network architectures to classify the morphologies of galaxies in both a 3-class (elliptical, lenticular, spiral) and 4-class (+irregular/miscellaneous) schema with a dataset of 14034 visually-classified SDSS images. We develop a new CNN architecture that outperforms existing models in both 3 and 4-way classification, with overall classification accuracies of 83% and 81% respectively. We also compare the accuracies of 2-way / binary classifications between all four classes, showing that ellipticals and spirals are most easily distinguished (>98% accuracy), while spirals and irregulars are hardest to differentiate (78% accuracy). Through an analysis of all classified samples, we find tentative evidence that misclassifications are physically meaningful, with lenticulars misclassified as ellipticals tending to be more massive, among other trends. We further combine our binary CNN classifiers to perform a hierarchical classification of samples, obtaining comparable accuracies (81%) to the direct 3-class CNN, but considerably worse accuracies in the 4-way case (65%). As an additional verification, we apply our networks to a small sample of Galaxy Zoo images, obtaining accuracies of 92%, 82% and 77% for the binary, 3-way and 4-way classifications respectively.
We present a star/galaxy classification for the Southern Photometric Local Universe Survey (S-PLUS), based on a Machine Learning approach: the Random Forest algorithm. We train the algorithm using the S-PLUS optical photometry up to $r$=21, matched to SDSS/DR13, and morphological parameters. The metric of importance is defined as the relative decrease of the initial accuracy when all correlations related to a certain feature is vanished. In general, the broad photometric bands presented higher importance when compared to narrow ones. The influence of the morphological parameters has been evaluated training the RF with and without the inclusion of morphological parameters, presenting accuracy values of 95.0% and 88.1%, respectively. Particularly, the morphological parameter {rm FWHM/PSF} performed the highest importance over all features to distinguish between stars and galaxies, indicating that it is crucial to classify objects into stars and galaxies. We investigate the misclassification of stars and galaxies in the broad-band colour-colour diagram $(g-r)$ versus $(r-i)$. The morphology can notably improve the classification of objects at regions in the diagram where the misclassification was relatively high. Consequently, it provides cleaner samples for statistical studies. The expected contamination rate of red galaxies as a function of the redshift is estimated, providing corrections for red galaxy samples. The classification of QSOs as extragalactic objects is slightly better using photometric-only case. An extragalactic point-source catalogue is provided using the classification without any morphology feature (only the SED information) with additional constraints on photometric redshifts and {rm FWHM/PSF} values.
The cosmic web plays a major role in the formation and evolution of galaxies and defines, to a large extent, their properties. However, the relation between galaxies and environment is still not well understood. Here we present a machine learning approach to study imprints of environmental effects on the mass assembly of haloes. We present a galaxy-LSS machine learning classifier based on galaxy properties sensitive to the environment. We then use the classifier to assess the relevance of each property. Correlations between galaxy properties and their cosmic environment can be used to predict galaxy membership to void/wall or filament/cluster with an accuracy of $93%$. Our study unveils environmental information encoded in properties of haloes not normally considered directly dependent on the cosmic environment such as merger history and complexity. Understanding the physical mechanism by which the cosmic web is imprinted in a halo can lead to significant improvements in galaxy formation models. This is accomplished by extracting features from galaxy properties and merger trees, computing feature scores for each feature and then applying support vector machine to different feature sets. To this end, we have discovered that the shape and depth of the merger tree, formation time and density of the galaxy are strongly associated with the cosmic environment. We describe a significant improvement in the original classification algorithm by performing LU decomposition of the distance matrix computed by the feature vectors and then using the output of the decomposition as input vectors for support vector machine.