ترغب بنشر مسار تعليمي؟ اضغط هنا

With growing data volumes from synoptic surveys, astronomers must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace thes e human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities, and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All Sky Automated Survey (ASAS), and unveil the Machine-learned ASAS Classification Catalog (MACC), which is a 28-class probabilistic classification catalog of 50,124 ASAS sources. We estimate that MACC achieves a sub-20% classification error rate, and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes. The MACC is publicly available at http://www.bigmacc.info.
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manif est as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and OGLE, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply active learning to classify variable stars in the ASAS survey, finding dramatic improvement in our agreement with the ACVS catalog, from 65.5% to 79.5%, and a significant increase in the classifiers average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.
With the coming data deluge from synoptic surveys, there is a growing need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly-observed variables based on a small number of time-series measureme nts. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics (feature), detail methods to robustly estimate periodic light-curve features, introduce tree-ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this represents a 24% improvement over the best previous classifier on these data. This methodology is effective for identifying samples of specific science classes: for pulsational variables used in Milky Way tomography we obtain a discovery efficiency of 98.2% and for eclipsing systems we find an efficiency of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is superior to other machine-learned methods in terms of accuracy, speed, and relative immunity to features with no useful class information; the RF classifier can also be used to estimate the importance of each feature in classification. Additionally, we present the first astronomical use of hierarchical classification methods to incorporate a known class taxonomy in the classifier, which further reduces the catastrophic error rate to 7.8%. Excluding low-amplitude sources, our overall error rate improves to 14%, with a catastrophic error rate of 3.5%.
To further our knowledge of the complex physical process of galaxy formation, it is essential that we characterize the formation and evolution of large databases of galaxies. The spectral synthesis STARLIGHT code of Cid Fernandes et al. (2004) was de signed for this purpose. Results of STARLIGHT are highly dependent on the choice of input basis of simple stellar population (SSP) spectra. Speed of the code, which uses random walks through the parameter space, scales as the square of the number of basis spectra, making it computationally necessary to choose a small number of SSPs that are coarsely sampled in age and metallicity. In this paper, we develop methods based on diffusion map (Lafon & Lee, 2006) that, for the first time, choose appropriate bases of prototype SSP spectra from a large set of SSP spectra designed to approximate the continuous grid of age and metallicity of SSPs of which galaxies are truly composed. We show that our techniques achieve better accuracy of physical parameter estimation for simulated galaxies. Specifically, we show that our methods significantly decrease the age-metallicity degeneracy that is common in galaxy population synthesis methods. We analyze a sample of 3046 galaxies in SDSS DR6 and compare the parameter estimates obtained from different basis choices.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا