No Arabic abstract
Aims: We present a custom support vector machine classification package for photometric redshift estimation, including comparisons with other methods. We also explore the efficacy of including galaxy shape information in redshift estimation. Support vector machines, a type of machine learning, utilize optimization theory and supervised learning algorithms to construct predictive models based on the information content of data in a way that can treat different input features symmetrically. Methods: The custom support vector machine package we have developed is designated SPIDERz and made available to the community. As test data for evaluating performance and comparison with other methods, we apply SPIDERz to four distinct data sets: 1) the publicly available portion of the PHAT-1 catalog based on the GOODS-N field with spectroscopic redshifts in the range $z < 3.6$, 2) 14365 galaxies from the COSMOS bright survey with photometric band magnitudes, morphology, and spectroscopic redshifts inside $z < 1.4$, 3) 3048 galaxies from the overlap of COSMOS photometry and morphology with 3D-HST spectroscopy extending to $z < 3.9$, and 4) 2612 galaxies with five-band photometric magnitudes and morphology from the All-wavelength Extended Groth Strip International Survey and $z < 1.57$. Results: We find that SPIDER-z achieves results competitive with other empirical packages on the PHAT-1 data, and performs quite well in estimating redshifts with the COSMOS and AEGIS data, including in the cases of a large redshift range ($0 < z < 3.9$). We also determine from analyses with both the COSMOS and AEGIS data that the inclusion of morphological information does not have a statistically significant benefit for photometric redshift estimation with the techniques employed here.
Obtaining accurate photometric redshift estimations is an important aspect of cosmology, remaining a prerequisite of many analyses. In creating novel methods to produce redshift estimations, there has been a shift towards using machine learning techniques. However, there has not been as much of a focus on how well different machine learning methods scale or perform with the ever-increasing amounts of data being produced. Here, we introduce a benchmark designed to analyse the performance and scalability of different supervised machine learning methods for photometric redshift estimation. Making use of the Sloan Digital Sky Survey (SDSS - DR12) dataset, we analysed a variety of the most used machine learning algorithms. By scaling the number of galaxies used to train and test the algorithms up to one million, we obtained several metrics demonstrating the algorithms performance and scalability for this task. Furthermore, by introducing a new optimisation method, time-considered optimisation, we were able to demonstrate how a small concession of error can allow for a great improvement in efficiency. From the algorithms tested we found that the Random Forest performed best in terms of error with a mean squared error, MSE = 0.0042; however, as other algorithms such as Boosted Decision Trees and k-Nearest Neighbours performed incredibly similarly, we used our benchmarks to demonstrate how different algorithms could be superior in different scenarios. We believe benchmarks such as this will become even more vital with upcoming surveys, such as LSST, which will capture billions of galaxies requiring photometric redshifts.
We present a determination of the effects of including galaxy morphological parameters in photometric redshift estimation with an artificial neural network method. Neural networks, which recognize patterns in the information content of data in an unbiased way, can be a useful estimator of the additional information contained in extra parameters, such as those describing morphology, if the input data are treated on an equal footing. We use imaging and five band photometric magnitudes from the All-wavelength Extended Groth Strip International Survey. It is shown that certain principal components of the morphology information are correlated with galaxy type. However, we find that for the data used the inclusion of morphological information does not have a statistically significant benefit for photometric redshift estimation with the techniques employed here. The inclusion of these parameters may result in a trade-off between extra information and additional noise, with the additional noise becoming more dominant as more parameters are added.
The imminent advent of very large-scale optical sky surveys, such as Euclid and LSST, makes it important to find efficient ways of discovering rare objects such as strong gravitational lens systems, where a background object is multiply gravitationally imaged by a foreground mass. As well as finding the lens systems, it is important to reject false positives due to intrinsic structure in galaxies, and much work is in progress with machine learning algorithms such as neural networks in order to achieve both these aims. We present and discuss a Support Vector Machine (SVM) algorithm which makes use of a Gabor filterbank in order to provide learning criteria for separation of lenses and non-lenses, and demonstrate using blind challenges that under certain circumstances it is a particularly efficient algorithm for rejecting false positives. We compare the SVM engine with a large-scale human examination of 100000 simulated lenses in a challenge dataset, and also apply the SVM method to survey images from the Kilo-Degree Survey.
The robust estimation of the tiny distortions (shears) of galaxy shapes caused by weak gravitational lensing in the presence of much larger shape distortions due to the point-spread function (PSF) has been widely investigated. One major problem is that most galaxy shape measurement methods are subject to bias due to pixel noise in the images (noise bias). Noise bias is usually characterized using uncorrelated noise fields; however, real images typically have low-level noise correlations due to galaxies below the detection threshold, and some types of image processing can induce further noise correlations. We investigate the effective detection significance and its impact on noise bias in the presence of correlated noise for one method of galaxy shape estimation. For a fixed noise variance, the biases in galaxy shape estimates can differ substantially for uncorrelated versus correlated noise. However, use of an estimate of detection significance that accounts for the noise correlations can almost entirely remove these differences, leading to consistent values of noise bias as a function of detection significance for correlated and uncorrelated noise. We confirm the robustness of this finding to properties of the galaxy, the PSF, and the noise field, and quantify the impact of anisotropy in the noise correlations. Our results highlight the importance of understanding the pixel noise model and its impact on detection significances when correcting for noise bias on weak lensing.
Accurate photometric redshifts are a lynchpin for many future experiments to pin down the cosmological model and for studies of galaxy evolution. In this study, a novel sparse regression framework for photometric redshift estimation is presented. Simulated and real data from SDSS DR12 were used to train and test the proposed models. We show that approaches which include careful data preparation and model design offer a significant improvement in comparison with several competing machine learning algorithms. Standard implementations of most regression algorithms have as the objective the minimization of the sum of squared errors. For redshift inference, however, this induces a bias in the posterior mean of the output distribution, which can be problematic. In this paper we directly target minimizing $Delta z = (z_textrm{s} - z_textrm{p})/(1+z_textrm{s})$ and address the bias problem via a distribution-based weighting scheme, incorporated as part of the optimization objective. The results are compared with other machine learning algorithms in the field such as Artificial Neural Networks (ANN), Gaussian Processes (GPs) and sparse GPs. The proposed framework reaches a mean absolute $Delta z = 0.0026(1+z_textrm{s})$, over the redshift range of $0 le z_textrm{s} le 2$ on the simulated data, and $Delta z = 0.0178(1+z_textrm{s})$ over the entire redshift range on the SDSS DR12 survey, outperforming the standard ANNz used in the literature. We also investigate how the relative size of the training set affects the photometric redshift accuracy. We find that a training set of textgreater 30 per cent of total sample size, provides little additional constraint on the photometric redshifts, and note that our GP formalism strongly outperforms ANNz in the sparse data regime for the simulated data set.