Global Stellar Formation Rates or SFRs are crucial to constrain theories of galaxy formation and evolution. SFRs are usually estimated via spectroscopic observations which require too much previous telescope time and therefore cannot match the needs of modern precision cosmology. We therefore propose a novel method to estimate SFRs for large samples of galaxies using a variety of supervised ML models.
Star Formation Rates or SFRs are crucial to constrain theories of galaxy formation and evolution. SFRs are usually estimated via spectroscopic observations requiring large amounts of telescope time. We explore an alternative approach based on the photometric estimation of global SFRs for large samples of galaxies, by using methods such as automatic parameter space optimisation, and supervised Machine Learning models. We demonstrate that, with such approach, accurate multi-band photometry allows to estimate reliable SFRs. We also investigate how the use of photometric rather than spectroscopic redshifts, affects the accuracy of derived global SFRs. Finally, we provide a publicly available catalogue of SFRs for more than 27 million galaxies extracted from the Sloan Digital Sky survey Data Release 7. The catalogue is available through the Vizier facility at the following link ftp://cdsarc.u-strasbg.fr/pub/cats/J/MNRAS/486/1377.
Star-formation activity is a key property to probe the structure formation and hence characterise the large-scale structures of the universe. This information can be deduced from the star formation rate (SFR) and the stellar mass (Mstar), both of which, but especially the SFR, are very complex to estimate. Determining these quantities from UV, optical, or IR luminosities relies on complex modeling and on priors on galaxy types. We propose a method based on the machine-learning algorithm Random Forest to estimate the SFR and the Mstar of galaxies at redshifts in the range 0.01<z<0.3, independent of their type. The machine-learning algorithm takes as inputs the redshift, WISE luminosities, and WISE colours in near-IR, and is trained on spectra-extracted SFR and Mstar from the SDSS MPA-JHU DR8 catalogue as outputs. We show that our algorithm can accurately estimate SFR and Mstar with scatters of sigma_SFR=0.38 dex and sigma_Mstar=0.16 dex for SFR and stellar mass, respectively, and that it is unbiased with respect to redshift or galaxy type. The full-sky coverage of the WISE satellite allows us to characterise the star-formation activity of all galaxies outside the Galactic mask with spectroscopic redshifts in the range 0.01<z<0.3. The method can also be applied to photometric-redshift catalogues, with best scatters of sigma_SFR=0.42 dex and sigma_Mstar=0.24 dex obtained in the redshift range 0.1<z<0.3.
The vast volume of data generated by modern astronomical surveys offers test beds for the application of machine-learning. It is important to evaluate potential existing tools and determine those that are optimal for extracting scientific knowledge from the available observations. We explore the possibility of using clustering algorithms to separate stellar populations with distinct chemical patterns. Star clusters are likely the most chemically homogeneous populations in the Galaxy, and therefore any practical approach to identifying distinct stellar populations should at least be able to separate clusters from each other. We applied eight clustering algorithms combined with four dimensionality reduction strategies to automatically distinguish stellar clusters using chemical abundances of 13 elements. Our sample includes 18 stellar clusters with a total of 453 stars. We use statistical tests showing that some pairs of clusters are indistinguishable from each other when chemical abundances from the Apache Point Galactic Evolution Experiment (APOGEE) are used. However, for most clusters we are able to automatically assign membership with metric scores similar to previous works. The confusion level of the automatically selected clusters is consistent with statistical tests that demonstrate the impossibility of perfectly distinguishing all the clusters from each other. These statistical tests and confusion levels establish a limit for the prospect of blindly identifying stars born in the same cluster based solely on chemical abundances. We find that some of the algorithms we explored are capable of blindly identify stellar populations with similar ages and chemical distributions in the APOGEE data. Because some stellar clusters are chemically indistinguishable, our study supports the notion of extending weak chemical tagging that involves families of clusters instead of individual clusters
Theoretical stellar spectra rely on model stellar atmospheres computed based on our understanding of the physical laws at play in the stellar interiors. These models, coupled with atomic and molecular line databases, are used to generate theoretical stellar spectral libraries (SSLs) comprising of stellar spectra over a regular grid of atmospheric parameters (temperature, surface gravity, abundances) at any desired resolution. Another class of SSLs is referred to as empirical spectral libraries; these contain observed spectra at limited resolution. SSLs play an essential role in deriving the properties of stars and stellar populations. Both theoretical and empirical libraries suffer from limited coverage over the parameter space. This limitation is overcome to some extent by generating spectra for specific sets of atmospheric parameters by interpolating within the grid of available parameter space. In this work, we present a method for spectral interpolation in the optical region using machine learning algorithms that are generic, easily adaptable for any SSL without much change in the model parameters, and computationally inexpensive. We use two machine learning techniques, Random Forest (RF) and Artificial Neural Networks (ANN), and train the models on the MILES library. We apply the trained models to spectra from the CFLIB for testing and show that the performance of the two models is comparable. We show that both the models achieve better accuracy than the existing methods of polynomial based interpolation and the Gaussian radial basis function (RBF) interpolation.
A significant fraction of high redshift star-forming disc galaxies are known to host giant clumps, whose nature and role in galaxy evolution are yet to be understood. In this work we first present a new method based on neural networks to detect clumps in galaxy images. We use this method to detect clumps in the rest-frame optical and UV images of a complete sample of $sim1500$ star forming galaxies at $1<z<3$ in the CANDELS survey as well as in images from the VELA zoom-in cosmological simulations. We show that observational effects have a dramatic impact on the derived clump properties leading to an overestimation of the clump mass up to a factor of 10, which highlights the importance of fair comparisons between observations and simulations and the limitations of current HST data to study the resolved structure of distant galaxies. After correcting for these effects with a mixture density network, we estimate that the clump stellar mass function follows a power-law down to the completeness limit ($10^{7}$ solar masses) with the majority of the clumps being less massive than $10^9$ solar masses. This is in better agreement with recent gravitational lensing based measurements. The simulations explored in this work overall reproduce the shape of the observed clump stellar mass function and clumpy fractions when confronted under the same conditions, although they tend to lie in the lower limit of the confidence intervals of the observations. This agreement suggests that most of the observed clumps are formed in-situ.