ترغب بنشر مسار تعليمي؟ اضغط هنا

Imbalance Learning for Variable Star Classification

115   0   0.0 ( 0 )
 نشر من قبل Zafiirah Hosenie
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

The accurate automated classification of variable stars into their respective sub-types is difficult. Machine learning based solutions often fall foul of the imbalanced learning problem, which causes poor generalisation performance in practice, especially on rare variable star sub-types. In previous work, we attempted to overcome such deficiencies via the development of a hierarchical machine learning classifier. This algorithm-level approach to tackling imbalance, yielded promising results on Catalina Real-Time Survey (CRTS) data, outperforming the binary and multi-class classification schemes previously applied in this area. In this work, we attempt to further improve hierarchical classification performance by applying data-level approaches to directly augment the training data so that they better describe under-represented classes. We apply and report results for three data augmentation methods in particular: $textit{R}$andomly $textit{A}$ugmented $textit{S}$ampled $textit{L}$ight curves from magnitude $textit{E}$rror ($texttt{RASLE}$), augmenting light curves with Gaussian Process modelling ($texttt{GpFit}$) and the Synthetic Minority Over-sampling Technique ($texttt{SMOTE}$). When combining the algorithm-level (i.e. the hierarchical scheme) together with the data-level approach, we further improve variable star classification accuracy by 1-4$%$. We found that a higher classification rate is obtained when using $texttt{GpFit}$ in the hierarchical model. Further improvement of the metric scores requires a better standard set of correctly identified variable stars and, perhaps enhanced features are needed.



قيم البحث

اقرأ أيضاً

Machine learning techniques have been successfully used to classify variable stars on widely-studied astronomical surveys. These datasets have been available to astronomers long enough, thus allowing them to perform deep analysis over several variabl e sources and generating useful catalogs with identified variable stars. The products of these studies are labeled data that enable supervised learning models to be trained successfully. However, when these models are blindly applied to data from new sky surveys their performance drops significantly. Furthermore, unlabeled data becomes available at a much higher rate than its labeled counterpart, since labeling is a manual and time-consuming effort. Domain adaptation techniques aim to learn from a domain where labeled data is available, the textit{source domain}, and through some adaptation perform well on a different domain, the textit{target domain}. We propose a full probabilistic model that represents the joint distribution of features from two surveys as well as a probabilistic transformation of the features between one survey to the other. This allows us to transfer labeled data to a study where it is not available and to effectively run a variable star classification model in a new survey. Our model represents the features of each domain as a Gaussian mixture and models the transformation as a translation, rotation and scaling of each separate component. We perform tests using three different variability catalogs: EROS, MACHO, and HiTS, presenting differences among them, such as the amount of observations per star, cadence, observational time and optical bands observed, among others.
Ongoing or upcoming surveys such as Gaia, ZTF, or LSST will observe light-curves of billons or more astronomical sources. This presents new challenges for identifying interesting and important types of variability. Collecting a sufficient number of l abelled data for training is difficult, however, especially in the early stages of a new survey. Here we develop a single-band light-curve classifier based on deep neural networks, and use transfer learning to address the training data paucity problem by conveying knowledge from one dataset to another. First we train a neural network on 16 variability features extracted from the light-curves of OGLE and EROS-2 variables. We then optimize this model using a small set (e.g. 5%) of periodic variable light-curves from the ASAS dataset in order to transfer knowledge inferred from OGLE/EROS-2 to a new ASAS classifier. With this we achieve good classification results on ASAS, thereby showing that knowledge can be successfully transferred between datasets. We demonstrate similar transfer learning using Hipparcos and ASAS-SN data. We therefore find that it is not necessary to train a neural network from scratch for every new survey, but rather that transfer learning can be used even when only a small set of labelled data is available in the new survey.
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manif est as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and OGLE, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply active learning to classify variable stars in the ASAS survey, finding dramatic improvement in our agreement with the ACVS catalog, from 65.5% to 79.5%, and a significant increase in the classifiers average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.
Modern computing and communication technologies can make data collection procedures very efficient. However, our ability to analyze large data sets and/or to extract information out from them is hard-pressed to keep up with our capacities for data co llection. Among these huge data sets, some of them are not collected for any particular research purpose. For a classification problem, this means that the essential label information may not be readily obtainable, in the data set in hands, and an extra labeling procedure is required such that we can have enough label information to be used for constructing a classification model. When the size of a data set is huge, to label each subject in it will cost a lot in both capital and time. Thus, it is an important issue to decide which subjects should be labeled first in order to efficiently reduce the training cost/time. Active learning method is a promising outlet for this situation, because with the active learning ideas, we can select the unlabeled subjects sequentially without knowing their label information. In addition, there will be no confirmed information about the essential variables for constructing an efficient classification rule. Thus, how to merge a variable selection scheme with an active learning procedure is of interest. In this paper, we propose a procedure for building binary classification models when the complete label information is not available in the beginning of the training stage. We study an model-based active learning procedure with sequential variable selection schemes, and discuss the results of the proposed procedure from both theoretical and numerical aspects.
The need for the development of automatic tools to explore astronomical databases has been recognized since the inception of CCDs and modern computers. Astronomers already have developed solutions to tackle several science problems, such as automatic classification of stellar objects, outlier detection, and globular clusters identification, among others. New science problems emerge and it is critical to be able to re-use the models learned before, without rebuilding everything from the beginning when the science problem changes. In this paper, we propose a new meta-model that automatically integrates existing classification models of variable stars. The proposed meta-model incorporates existing models that are trained in a different context, answering different questions and using different representations of data. Conventional mixture of experts algorithms in machine learning literature can not be used since each expert (model) uses different inputs. We also consider computational complexity of the model by using the most expensive models only when it is necessary. We test our model with EROS-2 and MACHO datasets, and we show that we solve most of the classification challenges only by training a meta-model to learn how to integrate the previous experts.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا