No Arabic abstract
When completed, the PHANGS-HST project will provide a census of roughly 50,000 compact star clusters and associations, as well as human morphological classifications for roughly 20,000 of those objects. These large numbers motivated the development of a more objective and repeatable method to help perform source classifications. In this paper we consider the results for five PHANGS-HST galaxies (NGC 628, NGC 1433, NGC 1566, NGC 3351, NGC 3627) using classifications from two convolutional neural network architectures (RESNET and VGG) trained using deep transfer learning techniques. The results are compared to classifications performed by humans. The primary result is that the neural network classifications are comparable in quality to the human classifications with typical agreement around 70 to 80$%$ for Class 1 clusters (symmetric, centrally concentrated) and 40 to 70$%$ for Class 2 clusters (asymmetric, centrally concentrated). If Class 1 and 2 are considered together the agreement is 82 $pm$ 3$%$. Dependencies on magnitudes, crowding, and background surface brightness are examined. A detailed description of the criteria and methodology used for the human classifications is included along with an examination of systematic differences between PHANGS-HST and LEGUS. The distribution of data points in a colour-colour diagram is used as a figure of merit to further test the relative performances of the different methods. The effects on science results (e.g., determinations of mass and age functions) of using different cluster classification methods are examined and found to be minimal.
We present the results of a proof-of-concept experiment which demonstrates that deep learning can successfully be used for production-scale classification of compact star clusters detected in HST UV-optical imaging of nearby spiral galaxies (D < 20 Mpc) in the PHANGS-HST survey. Given the relatively small nature of existing, human-labelled star cluster samples, we transfer the knowledge of state-of-the-art neural network models for real-object recognition to classify star clusters candidates into four morphological classes. We perform a series of experiments to determine the dependence of classification performance on: neural network architecture (ResNet18 and VGG19-BN); training data sets curated by either a single expert or three astronomers; and the size of the images used for training. We find that the overall classification accuracies are not significantly affected by these choices. The networks are used to classify star cluster candidates in the PHANGS-HST galaxy NGC 1559, which was not included in the training samples. The resulting prediction accuracies are 70%, 40%, 40-50%, 50-70% for class 1, 2, 3 star clusters, and class 4 non-clusters respectively. This performance is competitive with consistency achieved in previously published human and automated quantitative classification of star cluster candidate samples (70-80%, 40-50%, 40-50%, and 60-70%). The methods introduced herein lay the foundations to automate classification for star clusters at scale, and exhibit the need to prepare a standardized dataset of human-labelled star cluster classifications, agreed upon by a full range of experts in the field, to further improve the performance of the networks introduced in this study.
We present an innovative and widely applicable approach for the detection and classification of stellar clusters, developed for the PHANGS-HST Treasury Program, an $NUV$-to-$I$ band imaging campaign of 38 spiral galaxies. Our pipeline first generates a unified master source list for stars and candidate clusters, to enable a self-consistent inventory of all star formation products. To distinguish cluster candidates from stars, we introduce the Multiple Concentration Index (MCI) parameter, and measure inner and outer MCIs to probe morphology in more detail than with a single, standard concentration index (CI). We improve upon cluster candidate selection, jointly basing our criteria on expectations for MCI derived from synthetic cluster populations and published cluster catalogues, yielding model and empirical selection regions (respectively). Selection purity (confirmed clusters versus candidates, assessed via human-based classification) is high (up to 70%) for moderately luminous sources in the empirical selection region, and somewhat lower overall (outside the region or fainter). The number of candidates rises steeply with decreasing luminosity, but pipeline-integrated Machine Learning (ML) classification prevents this from being problematic. We quantify the performance of our PHANGS-HST methods in comparison to LEGUS for a sample of four galaxies in common to both surveys, finding overall agreement with 50-75% of human verified star clusters appearing in both catalogues, but also subtle differences attributable to specific choices adopted by each project. The PHANGS-HST ML-classified Class 1 or 2 catalogues reach $sim1$ magnitude fainter, $sim2times$ lower stellar mass, and are $2{-}5times$ larger in number, than attained in the human classified samples.
The sensitivity and angular resolution of photometric surveys executed by the Hubble Space Telescope (HST) enable studies of individual star clusters in galaxies out to a few tens of megaparsecs. The fitting of spectral energy distributions (SEDs) of star clusters is essential for measuring their physical properties and studying their evolution. We report on the use of the publicly available Code Investigating GALaxy Emission (CIGALE) SED fitting package to derive ages, stellar masses, and reddenings for star clusters identified in the Physics at High Angular resolution in Nearby GalaxieS-HST (PHANGS-HST) survey. Using samples of star clusters in the galaxy NGC 3351, we present results of benchmark analyses performed to validate the code and a comparison to SED fitting results from the Legacy ExtraGalactic Ultraviolet Survey (LEGUS). We consider procedures for the PHANGS-HST SED fitting pipeline, e.g., the choice of single stellar population models, the treatment of nebular emission and dust, and the use of fluxes versus magnitudes for the SED fitting. We report on the properties of clusters in NGC 3351 and find, on average, the clusters residing in the inner star-forming ring of NGC 3351 are young ($< 10$ Myr) and massive ($10^{5} M_{odot}$) while clusters in the stellar bulge are significantly older. Cluster mass function fits yield $beta$ values around -2, consistent with prior results with a tendency to be shallower at the youngest ages. Finally, we explore a Bayesian analysis with additional physically-motivated priors for the distribution of ages and masses and analyze the resulting cluster distributions.
Future astrophysical surveys such as J-PAS will produce very large datasets, which will require the deployment of accurate and efficient Machine Learning (ML) methods. In this work, we analyze the miniJPAS survey, which observed about 1 deg2 of the AEGIS field with 56 narrow-band filters and 4 ugri broad-band filters. We discuss the classification of miniJPAS sources into extended (galaxies) and point-like (e.g. stars) objects, a necessary step for the subsequent scientific analyses. We aim at developing an ML classifier that is complementary to traditional tools based on explicit modeling. In order to train and test our classifiers, we crossmatched the miniJPAS dataset with SDSS and HSC-SSP data. We trained and tested 6 different ML algorithms on the two crossmatched catalogs. As input for the ML algorithms we use the magnitudes from the 60 filters together with their errors, with and without the morphological parameters. We also use the mean PSF in the r detection band for each pointing. We find that the RF and ERT algorithms perform best in all scenarios. When analyzing the full magnitude range of 15<r<23.5 we find AUC=0.957 with RF when using only photometric information, and AUC=0.986 with ERT when using photometric and morphological information. Regarding feature importance, when using morphological parameters, FWHM is the most important feature. When using photometric information only, we observe that broad bands are not necessarily more important than narrow bands, and errors are as important as the measurements. ML algorithms can compete with traditional star/galaxy classifiers, outperforming the latter at fainter magnitudes (r>21). We use our best classifiers, with and without morphology, in order to produce a value added catalog available at https://j-pas.org/datareleases .
We present a machine learning (ML) pipeline to identify star clusters in the multi{color images of nearby galaxies, from observations obtained with the Hubble Space Telescope as part of the Treasury Project LEGUS (Legacy ExtraGalactic Ultraviolet Survey). StarcNet (STAR Cluster classification NETwork) is a multi-scale convolutional neural network (CNN) which achieves an accuracy of 68.6% (4 classes)/86.0% (2 classes: cluster/non-cluster) for star cluster classification in the images of the LEGUS galaxies, nearly matching human expert performance. We test the performance of StarcNet by applying pre-trained CNN model to galaxies not included in the training set, finding accuracies similar to the reference one. We test the effect of StarcNet predictions on the inferred cluster properties by comparing multi-color luminosity functions and mass-age plots from catalogs produced by StarcNet and by human-labeling; distributions in luminosity, color, and physical characteristics of star clusters are similar for the human and ML classified samples. There are two advantages to the ML approach: (1) reproducibility of the classifications: the ML algorithms biases are fixed and can be measured for subsequent analysis; and (2) speed of classification: the algorithm requires minutes for tasks that humans require weeks to months to perform. By achieving comparable accuracy to human classifiers, StarcNet will enable extending classifications to a larger number of candidate samples than currently available, thus increasing significantly the statistics for cluster studies.