No Arabic abstract
Wu & Peek (2020) predict SDSS-quality spectra based on Pan-STARRS broad-band textit{grizy} images using machine learning (ML). In this letter, we test their prediction for a unique object, UGC 2885 (Rubins galaxy), the largest and most massive, isolated disk galaxy in the local Universe ($D<100$ Mpc). After obtaining the ML predicted spectrum, we compare it to all existing spectroscopic information that is comparable to an SDSS spectrum of the central region: two archival spectra, one extracted from the VIRUS-P observations of this galaxy, and a new, targeted MMT/Binospec observation. Agreement is qualitatively good, though the ML prediction prefers line ratios slightly more towards those of an active galactic nucleus (AGN), compared to archival and VIRUS-P observed values. The MMT/Binospec nuclear spectrum unequivocally shows strong emission lines except H$beta$, the ratios of which are consistent with AGN activity. The ML approach to galaxy spectra may be a viable way to identify AGN supplementing NIR colors. How such a massive disk galaxy ($M^* = 10^{11}$ M$_odot$), which uncharacteristically shows no sign of interaction or mergers, manages to fuel its central AGN remains to be investigated.
Understanding the impact of halo properties beyond halo mass on the clustering of galaxies (namely galaxy assembly bias) remains a challenge for contemporary models of galaxy clustering. We explore the use of machine learning to predict the halo occupations and recover galaxy clustering and assembly bias in a semi-analytic galaxy formation model. For stellar-mass selected samples, we train a Random Forest algorithm on the number of central and satellite galaxies in each dark matter halo. With the predicted occupations, we create mock galaxy catalogues and measure the clustering and assembly bias. Using a range of halo and environment properties, we find that the machine learning predictions of the occupancy variations with secondary properties, galaxy clustering and assembly bias are all in excellent agreement with those of our target galaxy formation model. Internal halo properties are most important for the central galaxies prediction, while environment plays a critical role for the satellites. Our machine learning models are all provided in a usable format. We demonstrate that machine learning is a powerful tool for modelling the galaxy-halo connection, and can be used to create realistic mock galaxy catalogues which accurately recover the expected occupancy variations, galaxy clustering and galaxy assembly bias, imperative for cosmological analyses of upcoming surveys.
AGNs are very powerful galaxies characterized by extremely bright emissions coming out from their central massive black holes. Knowing the redshifts of AGNs provides us with an opportunity to determine their distance to investigate important astrophysical problems such as the evolution of the early stars, their formation along with the structure of early galaxies. The redshift determination is challenging because it requires detailed follow-up of multi-wavelength observations, often involving various astronomical facilities. Here, we employ machine learning algorithms to estimate redshifts from the observed gamma-ray properties and photometric data of gamma-ray loud AGN from the Fourth Fermi-LAT Catalog. The prediction is obtained with the Superlearner algorithm, using LASSO selected set of predictors. We obtain a tight correlation, with a Pearson Correlation Coefficient of 71.3% between the inferred and the observed redshifts, an average {Delta}z_norm = 11.6 x 10^-4. We stress that notwithstanding the small sample of gamma-ray loud AGNs, we obtain a reliable predictive model using Superlearner, which is an ensemble of several machine learning models.
In this work we explore the possibility of applying machine learning methods designed for one-dimensional problems to the task of galaxy image classification. The algorithms used for image classification typically rely on multiple costly steps, such as the Point Spread Function (PSF) deconvolution and the training and application of complex Convolutional Neural Networks (CNN) of thousands or even millions of parameters. In our approach, we extract features from the galaxy images by analysing the elliptical isophotes in their light distribution and collect the information in a sequence. The sequences obtained with this method present definite features allowing a direct distinction between galaxy types, as opposed to smooth Sersic profiles. Then, we train and classify the sequences with machine learning algorithms, designed through the platform Modulos AutoML, and study how they optimize the classification task. As a demonstration of this method, we use the second public release of the Dark Energy Survey (DES DR2). We show that by applying it to this sample we are able to successfully distinguish between early-type and late-type galaxies, for images with signal-to-noise ratio greater then 300. This yields an accuracy of $86%$ for the early-type galaxies and $93%$ for the late-type galaxies, which is on par with most contemporary automated image classification approaches. Our novel method allows for galaxy images to be accurately classified and is faster than other approaches. Data dimensionality reduction also implies a significant lowering in computational cost. In the perspective of future data sets obtained with e.g. Euclid and the Vera Rubin Observatory (VRO), this work represents a path towards using a well-tested and widely used platform from industry in efficiently tackling galaxy classification problems at the peta-byte scale.
We present a star/galaxy classification for the Southern Photometric Local Universe Survey (S-PLUS), based on a Machine Learning approach: the Random Forest algorithm. We train the algorithm using the S-PLUS optical photometry up to $r$=21, matched to SDSS/DR13, and morphological parameters. The metric of importance is defined as the relative decrease of the initial accuracy when all correlations related to a certain feature is vanished. In general, the broad photometric bands presented higher importance when compared to narrow ones. The influence of the morphological parameters has been evaluated training the RF with and without the inclusion of morphological parameters, presenting accuracy values of 95.0% and 88.1%, respectively. Particularly, the morphological parameter {rm FWHM/PSF} performed the highest importance over all features to distinguish between stars and galaxies, indicating that it is crucial to classify objects into stars and galaxies. We investigate the misclassification of stars and galaxies in the broad-band colour-colour diagram $(g-r)$ versus $(r-i)$. The morphology can notably improve the classification of objects at regions in the diagram where the misclassification was relatively high. Consequently, it provides cleaner samples for statistical studies. The expected contamination rate of red galaxies as a function of the redshift is estimated, providing corrections for red galaxy samples. The classification of QSOs as extragalactic objects is slightly better using photometric-only case. An extragalactic point-source catalogue is provided using the classification without any morphology feature (only the SED information) with additional constraints on photometric redshifts and {rm FWHM/PSF} values.
Galaxy morphology is a fundamental quantity, that is essential not only for the full spectrum of galaxy-evolution studies, but also for a plethora of science in observational cosmology. While a rich literature exists on morphological-classification techniques, the unprecedented data volumes, coupled, in some cases, with the short cadences of forthcoming Big-Data surveys (e.g. from the LSST), present novel challenges for this field. Large data volumes make such datasets intractable for visual inspection (even via massively-distributed platforms like Galaxy Zoo), while short cadences make it difficult to employ techniques like supervised machine-learning, since it may be impractical to repeatedly produce training sets on short timescales. Unsupervised machine learning, which does not require training sets, is ideally suited to the morphological analysis of new and forthcoming surveys. Here, we employ an algorithm that performs clustering of graph representations, in order to group image patches with similar visual properties and objects constructed from those patches, like galaxies. We implement the algorithm on the Hyper-Suprime-Cam Subaru-Strategic-Program Ultra-Deep survey, to autonomously reduce the galaxy population to a small number (160) of morphological clusters, populated by galaxies with similar morphologies, which are then benchmarked using visual inspection. The morphological classifications (which we release publicly) exhibit a high level of purity, and reproduce known trends in key galaxy properties as a function of morphological type at z<1 (e.g. stellar-mass functions, rest-frame colours and the position of galaxies on the star-formation main sequence). Our study demonstrates the power of unsupervised machine learning in performing accurate morphological analysis, which will become indispensable in this new era of deep-wide surveys.