No Arabic abstract
We present AstroVaDEr, a variational autoencoder designed to perform unsupervised clustering and synthetic image generation using astronomical imaging catalogues. The model is a convolutional neural network that learns to embed images into a low dimensional latent space, and simultaneously optimises a Gaussian Mixture Model (GMM) on the embedded vectors to cluster the training data. By utilising variational inference, we are able to use the learned GMM as a statistical prior on the latent space to facilitate random sampling and generation of synthetic images. We demonstrate AstroVaDErs capabilities by training it on gray-scaled textit{gri} images from the Sloan Digital Sky Survey, using a sample of galaxies that are classified by Galaxy Zoo 2. An unsupervised clustering model is found which separates galaxies based on learned morphological features such as axis ratio, surface brightness profile, orientation and the presence of companions. We use the learned mixture model to generate synthetic images of galaxies based on the morphological profiles of the Gaussian components. AstroVaDEr succeeds in producing a morphological classification scheme from unlabelled data, but unexpectedly places high importance on the presence of companion objects---demonstrating the importance of human interpretation. The network is scalable and flexible, allowing for larger datasets to be classified, or different kinds of imaging data. We also demonstrate the generative properties of the model, which allow for realistic synthetic images of galaxies to be sampled from the learned classification scheme. These can be used to create synthetic image catalogs or to perform image processing tasks such as deblending.
The task of morphological classification is complex for simple parameterization, but important for research in the galaxy evolution field. Future galaxy surveys (e.g. EUCLID) will collect data about more than a $10^9$ galaxies. To obtain morphological information one needs to involve people to mark up galaxy images, which requires either a considerable amount of money or a huge number of volunteers. We propose an effective semi-supervised approach for galaxy morphology classification task, based on active learning of adversarial autoencoder (AAE) model. For a binary classification problem (top level question of Galaxy Zoo 2 decision tree) we achieved accuracy 93.1% on the test part with only 0.86 millions markup actions, this model can easily scale up on any number of images. Our best model with additional markup achieves accuracy of 95.5%. To the best of our knowledge it is a first time AAE semi-supervised learning model used in astronomy.
Generative Adversarial Networks (GANs) are a class of artificial neural network that can produce realistic, but artificial, images that resemble those in a training set. In typical GAN architectures these images are small, but a variant known as Spatial-GANs (SGANs) can generate arbitrarily large images, provided training images exhibit some level of periodicity. Deep extragalactic imaging surveys meet this criteria due to the cosmological tenet of isotropy. Here we train an SGAN to generate images resembling the iconic Hubble Space Telescope eXtreme Deep Field (XDF). We show that the properties of galaxies in generated images have a high level of fidelity with galaxies in the real XDF in terms of abundance, morphology, magnitude distributions and colours. As a demonstration we have generated a 7.6-billion pixel generative deep field spanning 1.45 degrees. The technique can be generalised to any appropriate imaging training set, offering a new purely data-driven approach for producing realistic mock surveys and synthetic data at scale, in astrophysics and beyond.
Galaxy morphology is a fundamental quantity, that is essential not only for the full spectrum of galaxy-evolution studies, but also for a plethora of science in observational cosmology. While a rich literature exists on morphological-classification techniques, the unprecedented data volumes, coupled, in some cases, with the short cadences of forthcoming Big-Data surveys (e.g. from the LSST), present novel challenges for this field. Large data volumes make such datasets intractable for visual inspection (even via massively-distributed platforms like Galaxy Zoo), while short cadences make it difficult to employ techniques like supervised machine-learning, since it may be impractical to repeatedly produce training sets on short timescales. Unsupervised machine learning, which does not require training sets, is ideally suited to the morphological analysis of new and forthcoming surveys. Here, we employ an algorithm that performs clustering of graph representations, in order to group image patches with similar visual properties and objects constructed from those patches, like galaxies. We implement the algorithm on the Hyper-Suprime-Cam Subaru-Strategic-Program Ultra-Deep survey, to autonomously reduce the galaxy population to a small number (160) of morphological clusters, populated by galaxies with similar morphologies, which are then benchmarked using visual inspection. The morphological classifications (which we release publicly) exhibit a high level of purity, and reproduce known trends in key galaxy properties as a function of morphological type at z<1 (e.g. stellar-mass functions, rest-frame colours and the position of galaxies on the star-formation main sequence). Our study demonstrates the power of unsupervised machine learning in performing accurate morphological analysis, which will become indispensable in this new era of deep-wide surveys.
Sharing images online poses security threats to a wide range of users due to the unawareness of privacy information. Deep features have been demonstrated to be a powerful representation for images. However, deep features usually suffer from the issues of a large size and requiring a huge amount of data for fine-tuning. In contrast to normal images (e.g., scene images), privacy images are often limited because of sensitive information. In this paper, we propose a novel approach that can work on limited data and generate deep features of smaller size. For training images, we first extract the initial deep features from the pre-trained model and then employ the K-means clustering algorithm to learn the centroids of these initial deep features. We use the learned centroids from training features to extract the final features for each testing image and encode our final features with the triangle encoding. To improve the discriminability of the features, we further perform the fusion of two proposed unsupervised deep features obtained from different layers. Experimental results show that the proposed features outperform state-of-the-art deep features, in terms of both classification accuracy and testing time.
Encoded within the morphological structure of galaxies are clues related to their formation and evolutionary history. Recent advances pertaining to the statistics of galaxy morphology include sophisticated measures of concentration (C), asymmetry (A), and clumpiness (S). In this study, these three parameters (CAS) have been applied to a suite of simulated galaxies and compared with observational results inferred from a sample of nearby galaxies. The simulations span a range of late-type systems, with masses between ~1e10 Msun and ~1e12 Msun, and employ star formation density thresholds between 0.1 cm^-3 and 100 cm^-3. We have found that the simulated galaxies possess comparable concentrations to their real counterparts. However, the results of the CAS analysis revealed that the simulated galaxies are generally more asymmetric, and that the range of clumpiness values extends beyond the range of those observed. Strong correlations were obtained between the three CAS parameters and colour (B-V), consistent with observed galaxies. Furthermore, the simulated galaxies possess strong links between their CAS parameters and Hubble type, mostly in-line with their real counterparts.