Do you want to publish a course? Click here

Hierarchical Matching and Regression with Application to Photometric Redshift Estimation

212   0   0.0 ( 0 )
 Added by Fionn Murtagh
 Publication date 2016
  fields Physics
and research's language is English
 Authors Fionn Murtagh




Ask ChatGPT about the research

This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or `photo-z problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.



rate research

Read More

In the modern galaxy surveys photometric redshifts play a central role in a broad range of studies, from gravitational lensing and dark matter distribution to galaxy evolution. Using a dataset of about 25,000 galaxies from the second data release of the Kilo Degree Survey (KiDS) we obtain photometric redshifts with five different methods: (i) Random forest, (ii) Multi Layer Perceptron with Quasi Newton Algorithm, (iii) Multi Layer Perceptron with an optimization network based on the Levenberg-Marquardt learning rule, (iv) the Bayesian Photometric Redshift model (or BPZ) and (v) a classical SED template fitting procedure (Le Phare). We show how SED fitting techniques could provide useful information on the galaxy spectral type which can be used to improve the capability of machine learning methods constraining systematic errors and reduce the occurrence of catastrophic outliers. We use such classification to train specialized regression estimators, by demonstrating that such hybrid approach, involving SED fitting and machine learning in a single collaborative framework, is capable to improve the overall prediction accuracy of photometric redshifts.
We introduce an ordinal classification algorithm for photometric redshift estimation, which significantly improves the reconstruction of photometric redshift probability density functions (PDFs) for individual galaxies and galaxy samples. As a use case we apply our method to CFHTLS galaxies. The ordinal classification algorithm treats distinct redshift bins as ordered values, which improves the quality of photometric redshift PDFs, compared with non-ordinal classification architectures. We also propose a new single value point estimate of the galaxy redshift, that can be used to estimate the full redshift PDF of a galaxy sample. This method is competitive in terms of accuracy with contemporary algorithms, which stack the full redshift PDFs of all galaxies in the sample, but requires orders of magnitudes less storage space. The methods described in this paper greatly improve the log-likelihood of individual object redshift PDFs, when compared with a popular Neural Network code (ANNz). In our use case, this improvement reaches 50% for high redshift objects ($z geq 0.75$). We show that using these more accurate photometric redshift PDFs will lead to a reduction in the systematic biases by up to a factor of four, when compared with less accurate PDFs obtained from commonly used methods. The cosmological analyses we examine and find improvement upon are the following: gravitational lensing cluster mass estimates, modelling of angular correlation functions, and modelling of cosmic shear correlation functions.
Accurate photometric redshifts are a lynchpin for many future experiments to pin down the cosmological model and for studies of galaxy evolution. In this study, a novel sparse regression framework for photometric redshift estimation is presented. Simulated and real data from SDSS DR12 were used to train and test the proposed models. We show that approaches which include careful data preparation and model design offer a significant improvement in comparison with several competing machine learning algorithms. Standard implementations of most regression algorithms have as the objective the minimization of the sum of squared errors. For redshift inference, however, this induces a bias in the posterior mean of the output distribution, which can be problematic. In this paper we directly target minimizing $Delta z = (z_textrm{s} - z_textrm{p})/(1+z_textrm{s})$ and address the bias problem via a distribution-based weighting scheme, incorporated as part of the optimization objective. The results are compared with other machine learning algorithms in the field such as Artificial Neural Networks (ANN), Gaussian Processes (GPs) and sparse GPs. The proposed framework reaches a mean absolute $Delta z = 0.0026(1+z_textrm{s})$, over the redshift range of $0 le z_textrm{s} le 2$ on the simulated data, and $Delta z = 0.0178(1+z_textrm{s})$ over the entire redshift range on the SDSS DR12 survey, outperforming the standard ANNz used in the literature. We also investigate how the relative size of the training set affects the photometric redshift accuracy. We find that a training set of textgreater 30 per cent of total sample size, provides little additional constraint on the photometric redshifts, and note that our GP formalism strongly outperforms ANNz in the sparse data regime for the simulated data set.
Obtaining accurate photometric redshift estimations is an important aspect of cosmology, remaining a prerequisite of many analyses. In creating novel methods to produce redshift estimations, there has been a shift towards using machine learning techniques. However, there has not been as much of a focus on how well different machine learning methods scale or perform with the ever-increasing amounts of data being produced. Here, we introduce a benchmark designed to analyse the performance and scalability of different supervised machine learning methods for photometric redshift estimation. Making use of the Sloan Digital Sky Survey (SDSS - DR12) dataset, we analysed a variety of the most used machine learning algorithms. By scaling the number of galaxies used to train and test the algorithms up to one million, we obtained several metrics demonstrating the algorithms performance and scalability for this task. Furthermore, by introducing a new optimisation method, time-considered optimisation, we were able to demonstrate how a small concession of error can allow for a great improvement in efficiency. From the algorithms tested we found that the Random Forest performed best in terms of error with a mean squared error, MSE = 0.0042; however, as other algorithms such as Boosted Decision Trees and k-Nearest Neighbours performed incredibly similarly, we used our benchmarks to demonstrate how different algorithms could be superior in different scenarios. We believe benchmarks such as this will become even more vital with upcoming surveys, such as LSST, which will capture billions of galaxies requiring photometric redshifts.
Photometric redshifts (photo-zs) are fundamental in galaxy surveys to address different topics, from gravitational lensing and dark matter distribution to galaxy evolution. The Kilo Degree Survey (KiDS), i.e. the ESO public survey on the VLT Survey Telescope (VST), provides the unprecedented opportunity to exploit a large galaxy dataset with an exceptional image quality and depth in the optical wavebands. Using a KiDS subset of about 25,000 galaxies with measured spectroscopic redshifts, we have derived photo-zs using i) three different empirical methods based on supervised machine learning, ii) the Bayesian Photometric Redshift model (or BPZ), and iii) a classical SED template fitting procedure (Le Phare). We confirm that, in the regions of the photometric parameter space properly sampled by the spectroscopic templates, machine learning methods provide better redshift estimates, with a lower scatter and a smaller fraction of outliers. SED fitting techniques, however, provide useful information on the galaxy spectral type which can be effectively used to constrain systematic errors and to better characterize potential catastrophic outliers. Such classification is then used to specialize the training of regression machine learning models, by demonstrating that a hybrid approach, involving SED fitting and machine learning in a single collaborative framework, can be effectively used to improve the accuracy of photo-z estimates.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا