Machine learning in APOGEE: Identification of stellar populations through chemical abundances


Abstract in English

The vast volume of data generated by modern astronomical surveys offers test beds for the application of machine-learning. It is important to evaluate potential existing tools and determine those that are optimal for extracting scientific knowledge from the available observations. We explore the possibility of using clustering algorithms to separate stellar populations with distinct chemical patterns. Star clusters are likely the most chemically homogeneous populations in the Galaxy, and therefore any practical approach to identifying distinct stellar populations should at least be able to separate clusters from each other. We applied eight clustering algorithms combined with four dimensionality reduction strategies to automatically distinguish stellar clusters using chemical abundances of 13 elements. Our sample includes 18 stellar clusters with a total of 453 stars. We use statistical tests showing that some pairs of clusters are indistinguishable from each other when chemical abundances from the Apache Point Galactic Evolution Experiment (APOGEE) are used. However, for most clusters we are able to automatically assign membership with metric scores similar to previous works. The confusion level of the automatically selected clusters is consistent with statistical tests that demonstrate the impossibility of perfectly distinguishing all the clusters from each other. These statistical tests and confusion levels establish a limit for the prospect of blindly identifying stars born in the same cluster based solely on chemical abundances. We find that some of the algorithms we explored are capable of blindly identify stellar populations with similar ages and chemical distributions in the APOGEE data. Because some stellar clusters are chemically indistinguishable, our study supports the notion of extending weak chemical tagging that involves families of clusters instead of individual clusters

Download