Do you want to publish a course? Click here

Decision Tree Classifiers for Star/Galaxy Separation

84   0   0.0 ( 0 )
 Publication date 2010
  fields Physics
and research's language is English




Ask ChatGPT about the research

We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of $884,126$ SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: $14le rle21$ ($85.2%$) and $rge19$ ($82.1%$). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT and Ball et al. (2006). We find that our FT classifier is comparable or better in completeness over the full magnitude range $15le rle21$, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes ($r>19$), our classifier is the only one able to maintain high completeness ($>$80%) while still achieving low contamination ($sim2.5%$). Finally, we apply our FT classifier to separate stars from galaxies in the full set of $69,545,326$ SDSS photometric objects in the magnitude range $14le rle21$.



rate research

Read More

We address the problem of separating stars from galaxies in future large photometric surveys. We focus our analysis on simulations of the Dark Energy Survey (DES). In the first part of the paper, we derive the science requirements on star/galaxy separation, for measurement of the cosmological parameters with the Gravitational Weak Lensing and Large Scale Structure probes. These requirements are dictated by the need to control both the statistical and systematic errors on the cosmological parameters, and by Point Spread Function calibration. We formulate the requirements in terms of the completeness and purity provided by a given star/galaxy classifier. In order to achieve these requirements at faint magnitudes, we propose a new method for star/galaxy separation in the second part of the paper. We first use Principal Component Analysis to outline the correlations between the objects parameters and extract from it the most relevant information. We then use the reduced set of parameters as input to an Artificial Neural Network. This multi-parameter approach improves upon purely morphometric classifiers (such as the classifier implemented in SExtractor), especially at faint magnitudes: it increases the purity by up to 20% for stars and by up to 12% for galaxies, at i-magnitude fainter than 23.
We discuss the statistical foundations of morphological star-galaxy separation. We show that many of the star-galaxy separation metrics in common use today (e.g. by SDSS or SExtractor) are closely related both to each other, and to the model odds ratio derived in a Bayesian framework by Sebok (1979). While the scaling of these algorithms with the noise properties of the sources varies, these differences do not strongly differentiate their performance. We construct a model of the performance of a star-galaxy separator in a realistic survey to understand the impact of observational signal-to-noise ratio (or equivalently, 5-sigma limiting depth) and seeing on classification performance. The model quantitatively demonstrates that, assuming realistic densities and angular sizes of stars and galaxies, 10% worse seeing can be compensated for by approximately 0.4 magnitudes deeper data to achieve the same star-galaxy classification performance. We discuss how to probabilistically combine multiple measurements, either of the same type (e.g., subsequent exposures), or differing types (e.g., multiple bandpasses), or differing methodologies (e.g., morphological and color-based classification). These methods are increasingly important for observations at faint magnitudes, where the rapidly rising number density of small galaxies makes star-galaxy classification a challenging problem. However, because of the significant role that the signal-to-noise ratio plays in resolving small galaxies, surveys with large-aperture telescopes, such as LSST, will continue to see improving star-galaxy separation as they push to these fainter magnitudes.
Context: It is crucial to develop a method for classifying objects detected in deep surveys at infrared wavelengths. We specifically need a method to separate galaxies from stars using only the infrared information to study the properties of galaxies, e.g., to estimate the angular correlation function, without introducing any additional bias. Aims. We aim to separate stars and galaxies in the data from the AKARI North Ecliptic Pole (NEP) Deep survey collected in nine AKARI / IRC bands from 2 to 24 {mu}m that cover the near- and mid-infrared wavelengths (hereafter NIR and MIR). We plan to estimate the correlation function for NIR and MIR galaxies from a sample selected according to our criteria in future research. Methods: We used support vector machines (SVM) to study the distribution of stars and galaxies in the AKARIs multicolor space. We defined the training samples of these objects by calculating their infrared stellarity parameter (sgc). We created the most efficient classifier and then tested it on the whole sample. We confirmed the developed separation with auxiliary optical data obtained by the Subaru telescope and by creating Euclidean normalized number count plots. Results: We obtain a 90% accuracy in pinpointing galaxies and 98% accuracy for stars in infrared multicolor space with the infrared SVM classifier. The source counts and comparison with the optical data (with a consistency of 65% for selecting stars and 96% for galaxies) confirm that our star/galaxy separation methods are reliable. Conclusions: The infrared classifier derived with the SVM method based on infrared sgc- selected training samples proves to be very efficient and accurate in selecting stars and galaxies in deep surveys at infrared wavelengths carried out without any previous target object selection.
Foreground components in the Cosmic Microwave Background (CMB) are sparse in a needlet representation, due to their specific morphological features (anisotropy, non-Gaussianity). This leads to the possibility of applying needlet thresholding procedures as a component separation tool. In this work, we develop algorithms based on different needlet-thresholding schemes and use them as extensions of existing, well-known component separation techniques, namely ILC and template-fitting. We test soft- and hard-thresholding schemes, using different procedures to set the optimal threshold level. We find that thresholding can be useful as a denoising tool for internal templates in experiments with few frequency channels, in conditions of low signal-to-noise. We also compare our method with other denoising techniques, showing that thresholding achieves the best performance in terms of reconstruction accuracy and data compression while preserving the map resolution. The best results in our tests are in particular obtained when considering template-fitting in an LSPE like experiment, especially for B-mode spectra.
We explore the use of random forest and gradient boosting, two powerful tree-based machine learning algorithms, for the detection of cosmic strings in maps of the cosmic microwave background (CMB), through their unique Gott-Kaiser-Stebbins effect on the temperature anisotropies.The information in the maps is compressed into feature vectors before being passed to the learning units. The feature vectors contain various statistical measures of processed CMB maps that boost the cosmic string detectability. Our proposed classifiers, after training, give results improved over or similar to the claimed detectability levels of the existing methods for string tension, $Gmu$. They can make $3sigma$ detection of strings with $Gmu gtrsim 2.1times 10^{-10}$ for noise-free, $0.9$-resolution CMB observations. The minimum detectable tension increases to $Gmu gtrsim 3.0times 10^{-8}$ for a more realistic, CMB S4-like (II) strategy, still a significant improvement over the previous results.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا