Photometric Redshift Estimation on SDSS Data Using Random Forests

130 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Samuel Carliles

تاريخ النشر 2007

مجال البحث فيزياء

والبحث باللغة English

تأليف Samuel Carliles - Tamas Budavari - Sebastien Heinis

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Given multiband photometric data from the SDSS DR6, we estimate galaxy redshifts. We employ a Random Forest trained on color features and spectroscopic redshifts from 80,000 randomly chosen primary galaxies yielding a mapping from color to redshift such that the difference between the estimate and the spectroscopic redshift is small. Our methodology results in tight RMS scatter in the estimates limited by photometric errors. Additionally, this approach yields an error distribution that is nearly Gaussian with parameter estimates giving reliable confidence intervals unique to each galaxy photometric redshift.

قيم البحث

499 - P. E. Freeman 2009

The development of fast and accurate methods of photometric redshift estimation is a vital step towards being able to fully utilize the data of next-generation surveys within precision cosmology. In this paper we apply a specific approach to spectral connectivity analysis (SCA; Lee & Wasserman 2009) called diffusion map. SCA is a class of non-linear techniques for transforming observed data (e.g., photometric colours for each galaxy, where the data lie on a complex subset of p-dimensional space) to a simpler, more natural coordinate system wherein we apply regression to make redshift predictions. As SCA relies upon eigen-decomposition, our training set size is limited to ~ 10,000 galaxies; we use the Nystrom extension to quickly estimate diffusion coordinates for objects not in the training set. We apply our method to 350,738 SDSS main sample galaxies, 29,816 SDSS luminous red galaxies, and 5,223 galaxies from DEEP2 with CFHTLS ugriz photometry. For all three datasets, we achieve prediction accuracies on par with previous analyses, and find that use of the Nystrom extension leads to a negligible loss of prediction accuracy relative to that achieved with the training sets. As in some previous analyses (e.g., Collister & Lahav 2004, Ball et al. 2008), we observe that our predictions are generally too high (low) in the low (high) redshift regimes. We demonstrate that this is a manifestation of attenuation bias, wherein measurement error (i.e., uncertainty in diffusion coordinates due to uncertainty in the measured fluxes/magnitudes) reduces the slope of the best-fit regression line. Mitigation of this bias is necessary if we are to use photometric redshift estimates produced by computationally efficient empirical methods in precision cosmology.

علم الكونيات والفيزياء الفلكية Nongalactic الأجهزة والأساليب للزيئات الفيزياء الفلكية

(f)RFCDE: Random Forests for Conditional Density Estimation and Functional Data

238 - Taylor Pospisil , Ann B. Lee 2019

Random forests is a common non-parametric regression technique which performs well for mixed-type unordered data and irrelevant features, while being robust to monotonic variable transformations. Standard random forests, however, do not efficiently h andle functional data and runs into a curse-of dimensionality when presented with high-resolution curves and surfaces. Furthermore, in settings with heteroskedasticity or multimodality, a regression point estimate with standard errors do not fully capture the uncertainty in our predictions. A more informative quantity is the conditional density p(y | x) which describes the full extent of the uncertainty in the response y given covariates x. In this paper we show how random forests can be efficiently leveraged for conditional density estimation, functional covariates, and multiple responses without increasing computational complexity. We provide open-source software for all procedures with R and Pyth

حساب المنهجية

Random Forests for Big Data

125 - Robin Genuer , Jean-Michel Poggi (UPD5 2015

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently so me statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper proposes a selective review of available proposals that deal with scaling random forests to Big Data problems. These proposals rely on parallel environments or on online adaptations of random forests. We also describe how related quantities -- such as out-of-bag error and variable importance -- are addressed in these methods. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment five variants on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data. One variant relies on subsampling while three others are related to parallel implementations of random forests and involve either various adaptations of bootstrap to Big Data or to divide-and-conquer approaches. The fifth variant relates on online learning of random forests. These numerical experiments lead to highlight the relative performance of the different variants, as well as some of their limitations.

التعلم الالي التعلم الآلي نظرية الإحصاء

Cooperative photometric redshift estimation

240 - Stefano Cavuoti , Crescenzo Tortora , Massimo Brescia 2017

In the modern galaxy surveys photometric redshifts play a central role in a broad range of studies, from gravitational lensing and dark matter distribution to galaxy evolution. Using a dataset of about 25,000 galaxies from the second data release of the Kilo Degree Survey (KiDS) we obtain photometric redshifts with five different methods: (i) Random forest, (ii) Multi Layer Perceptron with Quasi Newton Algorithm, (iii) Multi Layer Perceptron with an optimization network based on the Levenberg-Marquardt learning rule, (iv) the Bayesian Photometric Redshift model (or BPZ) and (v) a classical SED template fitting procedure (Le Phare). We show how SED fitting techniques could provide useful information on the galaxy spectral type which can be used to improve the capability of machine learning methods constraining systematic errors and reduce the occurrence of catastrophic outliers. We use such classification to train specialized regression estimators, by demonstrating that such hybrid approach, involving SED fitting and machine learning in a single collaborative framework, is capable to improve the overall prediction accuracy of photometric redshifts.

الأجهزة والأساليب للزيئات الفيزياء الفلكية

RFCDE: Random Forests for Conditional Density Estimation

92 - Taylor Pospisil , Ann B. Lee 2018

Random forests is a common non-parametric regression technique which performs well for mixed-type data and irrelevant covariates, while being robust to monotonic variable transformations. Existing random forest implementations target regression or cl assification. We introduce the RFCDE package for fitting random forest models optimized for nonparametric conditional density estimation, including joint densities for multiple responses. This enables analysis of conditional probability distributions which is useful for propagating uncertainty and of joint distributions that describe relationships between multiple responses and covariates. RFCDE is released under the MIT open-source license and can be accessed at https://github.com/tpospisi/rfcde . Both R and Pyth

التعلم الالي التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات