Systematic Serendipity: A Test of Unsupervised Machine Learning as a Method for Anomaly Detection


Abstract in English

Advances in astronomy are often driven by serendipitous discoveries. As survey astronomy continues to grow, the size and complexity of astronomical databases will increase, and the ability of astronomers to manually scour data and make such discoveries decreases. In this work, we introduce a machine learning-based method to identify anomalies in large datasets to facilitate such discoveries, and apply this method to long cadence lightcurves from NASAs Kepler Mission. Our method clusters data based on density, identifying anomalies as data that lie outside of dense regions. This work serves as a proof-of-concept case study and we test our method on four quarters of the Kepler long cadence lightcurves. We use Keplers most notorious anomaly, Boyajians Star (KIC 8462852), as a rare `ground truth for testing outlier identification to verify that objects of genuine scientific interest are included among the identified anomalies. We evaluate the methods ability to identify known anomalies by identifying unusual behavior in Boyajians Star, we report the full list of identified anomalies for these quarters, and present a sample subset of identified outliers that includes unusual phenomena, objects that are rare in the Kepler field, and data artifacts. By identifying <4% of each quarter as outlying data, we demonstrate that this anomaly detection method can create a more targeted approach in searching for rare and novel phenomena.

Download