No Arabic abstract
In this work we show that modern data-driven machine learning techniques can be successfully applied on lunar surface remote sensing data to learn, in an unsupervised way, sufficiently good representations of the data distribution to enable lunar technosignature and anomaly detection. In particular we train an unsupervised distribution learning neural network model to find the Apollo 15 landing module in a testing dataset, with no dataset specific model or hyperparameter tuning. Sufficiently good unsupervised data density estimation has the promise of enabling myriad useful downstream tasks, including locating lunar resources for future space flight and colonization, finding new impact craters or lunar surface reshaping, and algorithmically deciding the importance of unlabeled samples to send back from power- and bandwidth-constrained missions. We show in this work that such unsupervised learning can be successfully done in the lunar remote sensing and space science contexts.
Advances in astronomy are often driven by serendipitous discoveries. As survey astronomy continues to grow, the size and complexity of astronomical databases will increase, and the ability of astronomers to manually scour data and make such discoveries decreases. In this work, we introduce a machine learning-based method to identify anomalies in large datasets to facilitate such discoveries, and apply this method to long cadence lightcurves from NASAs Kepler Mission. Our method clusters data based on density, identifying anomalies as data that lie outside of dense regions. This work serves as a proof-of-concept case study and we test our method on four quarters of the Kepler long cadence lightcurves. We use Keplers most notorious anomaly, Boyajians Star (KIC 8462852), as a rare `ground truth for testing outlier identification to verify that objects of genuine scientific interest are included among the identified anomalies. We evaluate the methods ability to identify known anomalies by identifying unusual behavior in Boyajians Star, we report the full list of identified anomalies for these quarters, and present a sample subset of identified outliers that includes unusual phenomena, objects that are rare in the Kepler field, and data artifacts. By identifying <4% of each quarter as outlying data, we demonstrate that this anomaly detection method can create a more targeted approach in searching for rare and novel phenomena.
Obtaining labels for medical (image) data requires scarce and expensive experts. Moreover, due to ambiguous symptoms, single images rarely suffice to correctly diagnose a medical condition. Instead, it often requires to take additional background information such as the patients medical history or test results into account. Hence, instead of focusing on uninterpretable black-box systems delivering an uncertain final diagnosis in an end-to-end-fashion, we investigate how unsupervised methods trained on images without anomalies can be used to assist doctors in evaluating X-ray images of hands. Our method increases the efficiency of making a diagnosis and reduces the risk of missing important regions. Therefore, we adopt state-of-the-art approaches for unsupervised learning to detect anomalies and show how the outputs of these methods can be explained. To reduce the effect of noise, which often can be mistaken for an anomaly, we introduce a powerful preprocessing pipeline. We provide an extensive evaluation of different approaches and demonstrate empirically that even without labels it is possible to achieve satisfying results on a real-world dataset of X-ray images of hands. We also evaluate the importance of preprocessing and one of our main findings is that without it, most of our approaches perform not better than random. To foster reproducibility and accelerate research we make our code publicly available at https://github.com/Valentyn1997/xray
Unsupervised anomaly detection (UAD) learns one-class classifiers exclusively with normal (i.e., healthy) images to detect any abnormal (i.e., unhealthy) samples that do not conform to the expected normal patterns. UAD has two main advantages over its fully supervised counterpart. Firstly, it is able to directly leverage large datasets available from health screening programs that contain mostly normal image samples, avoiding the costly manual labelling of abnormal samples and the subsequent issues involved in training with extremely class-imbalanced data. Further, UAD approaches can potentially detect and localise any type of lesions that deviate from the normal patterns. One significant challenge faced by UAD methods is how to learn effective low-dimensional image representations to detect and localise subtle abnormalities, generally consisting of small lesions. To address this challenge, we propose a novel self-supervised representation learning method, called Constrained Contrastive Distribution learning for anomaly detection (CCD), which learns fine-grained feature representations by simultaneously predicting the distribution of augmented data and image contexts using contrastive learning with pretext constraints. The learned representations can be leveraged to train more anomaly-sensitive detection models. Extensive experiment results show that our method outperforms current state-of-the-art UAD approaches on three different colonoscopy and fundus screening datasets. Our code is available at https://github.com/tianyu0207/CCD.
Supernovae mark the explosive deaths of stars and enrich the cosmos with heavy elements. Future telescopes will discover thousands of new supernovae nightly, creating a need to flag astrophysically interesting events rapidly for followup study. Ideally, such an anomaly detection pipeline would be independent of our current knowledge and be sensitive to unexpected phenomena. Here we present an unsupervised method to search for anomalous time series in real time for transient, multivariate, and aperiodic signals. We use a RNN-based variational autoencoder to encode supernova time series and an isolation forest to search for anomalous events in the learned encoded space. We apply this method to a simulated dataset of 12,159 supernovae, successfully discovering anomalous supernovae and objects with catastrophically incorrect redshift measurements. This work is the first anomaly detection pipeline for supernovae which works with online datastreams.
Every field of Science is undergoing unprecedented changes in the discovery process, and Astronomy has been a main player in this transition since the beginning. The ongoing and future large and complex multi-messenger sky surveys impose a wide exploiting of robust and efficient automated methods to classify the observed structures and to detect and characterize peculiar and unexpected sources. We performed a preliminary experiment on KiDS DR4 data, by applying to the problem of anomaly detection two different unsupervised machine learning algorithms, considered as potentially promising methods to detect peculiar sources, a Disentangled Convolutional Autoencoder and an Unsupervised Random Forest. The former method, working directly on images, is considered potentially able to identify peculiar objects like interacting galaxies and gravitational lenses. The latter instead, working on catalogue data, could identify objects with unusual values of magnitudes and colours, which in turn could indicate the presence of singularities.