No Arabic abstract
The EPOCH (EROS-2 periodic variable star classification using machine learning) project aims to detect periodic variable stars in the EROS-2 light curve database. In this paper, we present the first result of the classification of periodic variable stars in the EROS-2 LMC database. To classify these variables, we first built a training set by compiling known variables in the Large Magellanic Cloud area from the OGLE and MACHO surveys. We crossmatched these variables with the EROS-2 sources and extracted 22 variability features from 28 392 light curves of the corresponding EROS-2 sources. We then used the random forest method to classify the EROS-2 sources in the training set. We designed the model to separate not only $delta$ Scuti stars, RR Lyraes, Cepheids, eclipsing binaries, and long-period variables, the superclasses, but also their subclasses, such as RRab, RRc, RRd, and RRe for RR Lyraes, and similarly for the other variable types. The model trained using only the superclasses shows 99% recall and precision, while the model trained on all subclasses shows 87% recall and precision. We applied the trained model to the entire EROS-2 LMC database, which contains about 29 million sources, and found 117 234 periodic variable candidates. Out of these 117 234 periodic variables, 55 285 have not been discovered by either OGLE or MACHO variability studies. This set comprises 1 906 $delta$ Scuti stars, 6 607 RR Lyraes, 638 Cepheids, 178 Type II Cepheids, 34 562 eclipsing binaries, and 11 394 long-period variables. A catalog of these EROS-2 LMC periodic variable stars will be available online at http://stardb.yonsei.ac.kr and at the CDS website (http://vizier.u-strasbg.fr/viz-bin/VizieR).
We present a machine learning package for the classification of periodic variable stars. Our package is intended to be general: it can classify any single band optical light curve comprising at least a few tens of observations covering durations from weeks to years, with arbitrary time sampling. We use light curves of periodic variable stars taken from OGLE and EROS-2 to train the model. To make our classifier relatively survey-independent, it is trained on 16 features extracted from the light curves (e.g. period, skewness, Fourier amplitude ratio). The model classifies light curves into one of seven superclasses - Delta Scuti, RR Lyrae, Cepheid, Type II Cepheid, eclipsing binary, long-period variable, non-variable - as well as subclasses of these, such as ab, c, d, and e types for RR Lyraes. When trained to give only superclasses, our model achieves 0.98 for both recall and precision as measured on an independent validation dataset (on a scale of 0 to 1). When trained to give subclasses, it achieves 0.81 for both recall and precision. In order to assess classification performance of the subclass model, we applied it to the MACHO, LINEAR, and ASAS periodic variables, which gave recall/precision of 0.92/0.98, 0.89/0.96, and 0.84/0.88, respectively. We also applied the subclass model to Hipparcos periodic variable stars of many other variability types that do not exist in our training set, in order to examine how much those types degrade the classification performance of our target classes. In addition, we investigate how the performance varies with the number of data points and duration of observations. We find that recall and precision do not vary significantly if the number of data points is larger than 80 and the duration is more than a few weeks. The classifier software of the subclass model is available from the GitHub repository (https://goo.gl/xmFO6Q).
We present a new classification method for quasar identification in the EROS-2 and MACHO datasets based on a boosted version of Random Forest classifier. We use a set of variability features including parameters of a continuous auto regressive model. We prove that continuous auto regressive parameters are very important discriminators in the classification process. We create two training sets (one for EROS-2 and one for MACHO datasets) using known quasars found in the LMC. Our models accuracy in both EROS-2 and MACHO training sets is about 90% precision and 86% recall, improving the state of the art models accuracy in quasar detection. We apply the model on the complete, including 28 million objects, EROS-2 and MACHO LMC datasets, finding 1160 and 2551 candidates respectively. To further validate our list of candidates, we crossmatched our list with a previous 663 known strong candidates, getting 74% of matches for MACHO and 40% in EROS-2. The main difference on matching level is because EROS-2 is a slightly shallower survey which translates to significantly lower signal-to-noise ratio lightcurves.
We describe a methodology to classify periodic variable stars identified using photometric time-series measurements constructed from the Wide-field Infrared Survey Explorer (WISE) full-mission single-exposure Source Databases. This will assist in the future construction of a WISE Variable Source Database that assigns variables to specific science classes as constrained by the WISE observing cadence with statistically meaningful classification probabilities. We have analyzed the WISE light curves of 8273 variable stars identified in previous optical variability surveys (MACHO, GCVS, and ASAS) and show that Fourier decomposition techniques can be extended into the mid-IR to assist with their classification. Combined with other periodic light-curve features, this sample is then used to train a machine-learned classifier based on the random forest (RF) method. Consistent with previous classification studies of variable stars in general, the RF machine-learned classifier is superior to other methods in terms of accuracy, robustness against outliers, and relative immunity to features that carry little or redundant class information. For the three most common classes identified by WISE: Algols, RR Lyrae, and W Ursae Majoris type variables, we obtain classification efficiencies of 80.7%, 82.7%, and 84.5% respectively using cross-validation analyses, with 95% confidence intervals of approximately +/-2%. These accuracies are achieved at purity (or reliability) levels of 88.5%, 96.2%, and 87.8% respectively, similar to that achieved in previous automated classification studies of periodic variable stars.
We present a novel automated methodology to detect and classify periodic variable stars in a large database of photometric time series. The methods are based on multivariate Bayesian statistics and use a multi-stage approach. We applied our method to the ground-based data of the TrES Lyr1 field, which is also observed by the Kepler satellite, covering ~26000 stars. We found many eclipsing binaries as well as classical non-radial pulsators, such as slowly pulsating B stars, Gamma Doradus, Beta Cephei and Delta Scuti stars. Also a few classical radial pulsators were found.
We present a new method to discriminate periodic from non-periodic irregularly sampled lightcurves. We introduce a periodic kernel and maximize a similarity measure derived from information theory to estimate the periods and a discriminator factor. We tested the method on a dataset containing 100,000 synthetic periodic and non-periodic lightcurves with various periods, amplitudes and shapes generated using a multivariate generative model. We correctly identified periodic and non-periodic lightcurves with a completeness of 90% and a precision of 95%, for lightcurves with a signal-to-noise ratio (SNR) larger than 0.5. We characterize the efficiency and reliability of the model using these synthetic lightcurves and applied the method on the EROS-2 dataset. A crucial consideration is the speed at which the method can be executed. Using hierarchical search and some simplification on the parameter search we were able to analyze 32.8 million lightcurves in 18 hours on a cluster of GPGPUs. Using the sensitivity analysis on the synthetic dataset, we infer that 0.42% in the LMC and 0.61% in the SMC of the sources show periodic behavior. The training set, the catalogs and source code are all available in http://timemachine.iic.harvard.edu.