No Arabic abstract
The need for the development of automatic tools to explore astronomical databases has been recognized since the inception of CCDs and modern computers. Astronomers already have developed solutions to tackle several science problems, such as automatic classification of stellar objects, outlier detection, and globular clusters identification, among others. New science problems emerge and it is critical to be able to re-use the models learned before, without rebuilding everything from the beginning when the science problem changes. In this paper, we propose a new meta-model that automatically integrates existing classification models of variable stars. The proposed meta-model incorporates existing models that are trained in a different context, answering different questions and using different representations of data. Conventional mixture of experts algorithms in machine learning literature can not be used since each expert (model) uses different inputs. We also consider computational complexity of the model by using the most expensive models only when it is necessary. We test our model with EROS-2 and MACHO datasets, and we show that we solve most of the classification challenges only by training a meta-model to learn how to integrate the previous experts.
We present a novel automated methodology to detect and classify periodic variable stars in a large database of photometric time series. The methods are based on multivariate Bayesian statistics and use a multi-stage approach. We applied our method to the ground-based data of the TrES Lyr1 field, which is also observed by the Kepler satellite, covering ~26000 stars. We found many eclipsing binaries as well as classical non-radial pulsators, such as slowly pulsating B stars, Gamma Doradus, Beta Cephei and Delta Scuti stars. Also a few classical radial pulsators were found.
We present a machine learning package for the classification of periodic variable stars. Our package is intended to be general: it can classify any single band optical light curve comprising at least a few tens of observations covering durations from weeks to years, with arbitrary time sampling. We use light curves of periodic variable stars taken from OGLE and EROS-2 to train the model. To make our classifier relatively survey-independent, it is trained on 16 features extracted from the light curves (e.g. period, skewness, Fourier amplitude ratio). The model classifies light curves into one of seven superclasses - Delta Scuti, RR Lyrae, Cepheid, Type II Cepheid, eclipsing binary, long-period variable, non-variable - as well as subclasses of these, such as ab, c, d, and e types for RR Lyraes. When trained to give only superclasses, our model achieves 0.98 for both recall and precision as measured on an independent validation dataset (on a scale of 0 to 1). When trained to give subclasses, it achieves 0.81 for both recall and precision. In order to assess classification performance of the subclass model, we applied it to the MACHO, LINEAR, and ASAS periodic variables, which gave recall/precision of 0.92/0.98, 0.89/0.96, and 0.84/0.88, respectively. We also applied the subclass model to Hipparcos periodic variable stars of many other variability types that do not exist in our training set, in order to examine how much those types degrade the classification performance of our target classes. In addition, we investigate how the performance varies with the number of data points and duration of observations. We find that recall and precision do not vary significantly if the number of data points is larger than 80 and the duration is more than a few weeks. The classifier software of the subclass model is available from the GitHub repository (https://goo.gl/xmFO6Q).
Astronomical surveys of celestial sources produce streams of noisy time series measuring flux versus time (light curves). Unlike in many other physical domains, however, large (and source-specific) temporal gaps in data arise naturally due to intranight cadence choices as well as diurnal and seasonal constraints. With nightly observations of millions of variable stars and transients from upcoming surveys, efficient and accurate discovery and classification techniques on noisy, irregularly sampled data must be employed with minimal human-in-the-loop involvement. Machine learning for inference tasks on such data traditionally requires the laborious hand-coding of domain-specific numerical summaries of raw data (features). Here we present a novel unsupervised autoencoding recurrent neural network (RNN) that makes explicit use of sampling times and known heteroskedastic noise properties. When trained on optical variable star catalogs, this network produces supervised classification models that rival other best-in-class approaches. We find that autoencoded features learned on one time-domain survey perform nearly as well when applied to another survey. These networks can continue to learn from new unlabeled observations and may be used in other unsupervised tasks such as forecasting and anomaly detection.
We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks, a probabilistic graphical model, that allows us to perform inference to pre- dict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilises sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model we use three catalogs with missing data (SAGE, 2MASS and UBVI) and one complete catalog (MACHO). We examine how classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches and at what computational cost. Integrating these catalogs with missing data we find that classification of variable objects improves by few percent and by 15% for quasar detection while keeping the computational cost the same.
Automatic classification methods applied to sky surveys have revolutionized the astronomical target selection process. Most surveys generate a vast amount of time series, or quotes{lightcurves}, that represent the brightness variability of stellar objects in time. Unfortunately, lightcurves observations take several years to be completed, producing truncated time series that generally remain without the application of automatic classifiers until they are finished. This happens because state of the art methods rely on a variety of statistical descriptors or features that present an increasing degree of dispersion when the number of observations decreases, which reduces their precision. In this paper we propose a novel method that increases the performance of automatic classifiers of variable stars by incorporating the deviations that scarcity of observations produces. Our method uses Gaussian Process Regression to form a probabilistic model of each lightcurves observations. Then, based on this model, bootstrapped samples of the time series features are generated. Finally a bagging approach is used to improve the overall performance of the classification. We perform tests on the MACHO and OGLE catalogs, results show that our method classifies effectively some variability classes using a small fraction of the original observations. For example, we found that RR Lyrae stars can be classified with around 80% of accuracy just by observing the first 5% of the whole lightcurves observations in MACHO and OGLE catalogs. We believe these results prove that, when studying lightcurves, it is important to consider the features error and how the measurement process impacts it.