Pulsar Candidate Identification Using Semi-Supervised Generative Adversarial Networks

Abstract in English

Machine learning methods are increasingly helping astronomers identify new radio pulsars. However, they require a large amount of labelled data, which is time consuming to produce and biased. Here we describe a Semi-Supervised Generative Adversarial Network (SGAN) which achieves better classification performance than the standard supervised algorithms using majority unlabelled datasets. We achieved an accuracy and mean F-Score of 94.9% trained on only 100 labelled candidates and 5000 unlabelled candidates compared to our standard supervised baseline which scored at 81.1% and 82.7% respectively. Our final model trained on a much larger labelled dataset achieved an accuracy and mean F-score value of 99.2% and a recall rate of 99.7%. This technique allows for high quality classification during the early stages of pulsar surveys on new instruments when limited labelled data is available. We open-source our work along with a new pulsar-candidate dataset produced from the High Time Resolution Universe - South Low Latitude Survey. This dataset has the largest number of pulsar detections of any public dataset and we hope it will be a valuable tool for benchmarking future machine learning models.
