ترغب بنشر مسار تعليمي؟ اضغط هنا

Statistical Classification Techniques for Photometric Supernova Typing

46   0   0.0 ( 0 )
 نشر من قبل James Newling
 تاريخ النشر 2010
والبحث باللغة English




اسأل ChatGPT حول البحث

Future photometric supernova surveys will produce vastly more candidates than can be followed up spectroscopically, highlighting the need for effective classification methods based on lightcurves alone. Here we introduce boosting and kernel density estimation techniques which have minimal astrophysical input, and compare their performance on 20,000 simulated Dark Energy Survey lightcurves. We demonstrate that these methods are comparable to the best template fitting methods currently used, and in particular do not require the redshift of the host galaxy or candidate. However both methods require a training sample that is representative of the full population, so typical spectroscopic supernova subsamples will lead to poor performance. To enable the full potential of such blind methods, we recommend that representative training samples should be used and so specific attention should be given to their creation in the design phase of future photometric surveys.

قيم البحث

اقرأ أيضاً

Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscop ic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques fitting parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieves an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.
The Pan-STARRS1 survey is obtaining multi-epoch imaging in 5 bands (gps rps ips zps yps) over the entire sky North of declination -30deg. We describe here the implementation of the Photometric Classification Server (PCS) for Pan-STARRS1. PCS will all ow the automatic classification of objects into star/galaxy/quasar classes based on colors, the measurement of photometric redshifts for extragalactic objects, and constrain stellar parameters for stellar objects, working at the catalog level. We present tests of the system based on high signal-to-noise photometry derived from the Medium Deep Fields of Pan-STARRS1, using available spectroscopic surveys as training and/or verification sets. We show that the Pan-STARRS1 photometry delivers classifications and photometric redshifts as good as the Sloan Digital Sky Survey (SDSS) photometry to the same magnitude limits. In particular, our preliminary results, based on this relatively limited dataset down to the SDSS spectroscopic limits and therefore potentially improvable, show that stars are correctly classified as such in 85% of cases, galaxies in 97% and QSOs in 84%. False positives are less than 1% for galaxies, ~19% for stars and ~28% QSOs. Moreover, photometric redshifts for 1000 luminous red galaxies up to redshift 0.5 are determined to 2.4% precision with just 0.4% catastrophic outliers and small (-0.5%) residual bias. PCS will create a value added catalog with classifications and photometric redshifts for eventually many millions sources.
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manif est as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and OGLE, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply active learning to classify variable stars in the ASAS survey, finding dramatic improvement in our agreement with the ACVS catalog, from 65.5% to 79.5%, and a significant increase in the classifiers average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.
We report a framework for spectroscopic follow-up design for optimizing supernova photometric classification. The strategy accounts for the unavoidable mismatch between spectroscopic and photometric samples, and can be used even in the beginning of a new survey -- without any initial training set. The framework falls under the umbrella of active learning (AL), a class of algorithms that aims to minimize labelling costs by identifying a few, carefully chosen, objects which have high potential in improving the classifier predictions. As a proof of concept, we use the simulated data released after the Supernova Photometric Classification Challenge (SNPCC) and a random forest classifier. Our results show that, using only 12% the number of training objects in the SNPCC spectroscopic sample, this approach is able to double purity results. Moreover, in order to take into account multiple spectroscopic observations in the same night, we propose a semi-supervised batch-mode AL algorithm which selects a set of $N=5$ most informative objects at each night. In comparison with the initial state using the traditional approach, our method achieves 2.3 times higher purity and comparable figure of merit results after only 180 days of observation, or 800 queries (73% of the SNPCC spectroscopic sample size). Such results were obtained using the same amount of spectroscopic time necessary to observe the original SNPCC spectroscopic sample, showing that this type of strategy is feasible with current available spectroscopic resources. The code used in this work is available in the COINtoolbox: https://github.com/COINtoolbox/ActSNClass .
This review outlines concepts of mathematical statistics, elements of probability theory, hypothesis tests and point estimation for use in the analysis of modern astronomical data. Least squares, maximum likelihood, and Bayesian approaches to statist ical inference are treated. Resampling methods, particularly the bootstrap, provide valuable procedures when distributions functions of statistics are not known. Several approaches to model selection and good- ness of fit are considered. Applied statistics relevant to astronomical research are briefly discussed: nonparametric methods for use when little is known about the behavior of the astronomical populations or processes; data smoothing with kernel density estimation and nonparametric regression; unsupervised clustering and supervised classification procedures for multivariate problems; survival analysis for astronomical datasets with nondetections; time- and frequency-domain times series analysis for light curves; and spatial statistics to interpret the spatial distributions of points in low dimensions. Two types of resources are presented: about 40 recommended texts and monographs in various fields of statistics, and the public domain R software system for statistical analysis. Together with its sim 3500 (and growing) add-on CRAN packages, R implements a vast range of statistical procedures in a coherent high-level language with advanced graphics.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا