No Arabic abstract
We present a Bayesian approach to the redshift classification of emission-line galaxies when only a single emission line is detected spectroscopically. We consider the case of surveys for high-redshift Lyman-alpha-emitting galaxies (LAEs), which have traditionally been classified via an inferred rest-frame equivalent width (EW) greater than 20 angstrom. Our Bayesian method relies on known prior probabilities in measured emission-line luminosity functions and equivalent width distributions for the galaxy populations, and returns the probability that an object in question is an LAE given the characteristics observed. This approach will be directly relevant for the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), which seeks to classify ~10^6 emission-line galaxies into LAEs and low-redshift [O II] emitters. For a simulated HETDEX catalog with realistic measurement noise, our Bayesian method recovers 86% of LAEs missed by the traditional EW > 20 angstrom cutoff over 2 < z < 3, outperforming the EW cut in both contamination and incompleteness. This is due to the methods ability to trade off between the two types of binary classification error by adjusting the stringency of the probability requirement for classifying an observed object as an LAE. In our simulations of HETDEX, this method reduces the uncertainty in cosmological distance measurements by 14% with respect to the EW cut, equivalent to recovering 29% more cosmological information. Rather than using binary object labels, this method enables the use of classification probabilities in large-scale structure analyses. It can be applied to narrowband emission-line surveys as well as upcoming large spectroscopic surveys including Euclid and WFIRST.
We describe the redmonster automated redshift measurement and spectral classification software designed for the extended Baryon Oscillation Spectroscopic Survey (eBOSS) of the Sloan Digital Sky Survey IV (SDSS-IV). We describe the algorithms, the template standard and requirements, and the newly developed galaxy templates to be used on eBOSS spectra. We present results from testing on early data from eBOSS, where we have found a 90.5% automated redshift and spectral classification success rate for the luminous red galaxy sample (redshifts 0.6$lesssim z lesssim$1.0). The redmonster performance meets the eBOSS cosmology requirements for redshift classification and catastrophic failures, and represents a significant improvement over the previous pipeline. We describe the empirical processes used to determine the optimum number of additive polynomial terms in our models and an acceptable $Deltachi_r^2$ threshold for declaring statistical confidence. Statistical errors on redshift measurement due to photon shot noise are assessed, and we find typical values of a few tens of km s$^{-1}$. An investigation of redshift differences in repeat observations scaled by error estimates yields a distribution with a Gaussian mean and standard deviation of $musim$0.01 and $sigmasim$0.65, respectively, suggesting the reported statistical redshift uncertainties are over-estimated by $sim$54%. We assess the effects of object magnitude, signal-to-noise ratio, fiber number, and fiber head location on the pipelines redshift success rate. Finally, we describe directions of ongoing development.
In this paper we discuss an application of machine learning based methods to the identification of candidate AGN from optical survey data and to the automatic classification of AGNs in broad classes. We applied four different machine learning algorithms, namely the Multi Layer Perceptron (MLP), trained respectively with the Conjugate Gradient, Scaled Conjugate Gradient and Quasi Newton learning rules, and the Support Vector Machines (SVM), to tackle the problem of the classification of emission line galaxies in different classes, mainly AGNs vs non-AGNs, obtained using optical photometry in place of the diagnostics based on line intensity ratios which are classically used in the literature. Using the same photometric features we discuss also the behavior of the classifiers on finer AGN classification tasks, namely Seyfert I vs Seyfert II and Seyfert vs LINER. Furthermore we describe the algorithms employed, the samples of spectroscopically classified galaxies used to train the algorithms, the procedure followed to select the photometric parameters and the performances of our methods in terms of multiple statistical indicators. The results of the experiments show that the application of self adaptive data mining algorithms trained on spectroscopic data sets and applied to carefully chosen photometric parameters represents a viable alternative to the classical methods that employ time-consuming spectroscopic observations.
Obtaining accurate photometric redshift estimations is an important aspect of cosmology, remaining a prerequisite of many analyses. In creating novel methods to produce redshift estimations, there has been a shift towards using machine learning techniques. However, there has not been as much of a focus on how well different machine learning methods scale or perform with the ever-increasing amounts of data being produced. Here, we introduce a benchmark designed to analyse the performance and scalability of different supervised machine learning methods for photometric redshift estimation. Making use of the Sloan Digital Sky Survey (SDSS - DR12) dataset, we analysed a variety of the most used machine learning algorithms. By scaling the number of galaxies used to train and test the algorithms up to one million, we obtained several metrics demonstrating the algorithms performance and scalability for this task. Furthermore, by introducing a new optimisation method, time-considered optimisation, we were able to demonstrate how a small concession of error can allow for a great improvement in efficiency. From the algorithms tested we found that the Random Forest performed best in terms of error with a mean squared error, MSE = 0.0042; however, as other algorithms such as Boosted Decision Trees and k-Nearest Neighbours performed incredibly similarly, we used our benchmarks to demonstrate how different algorithms could be superior in different scenarios. We believe benchmarks such as this will become even more vital with upcoming surveys, such as LSST, which will capture billions of galaxies requiring photometric redshifts.
Knowing the redshift of galaxies is one of the first requirements of many cosmological experiments, and as its impossible to perform spectroscopy for every galaxy being observed, photometric redshift (photo-z) estimations are still of particular interest. Here, we investigate different deep learning methods for obtaining photo-z estimates directly from images, comparing these with traditional machine learning algorithms which make use of magnitudes retrieved through photometry. As well as testing a convolutional neural network (CNN) and inception-module CNN, we introduce a novel mixed-input model which allows for both images and magnitude data to be used in the same model as a way of further improving the estimated redshifts. We also perform benchmarking as a way of demonstrating the performance and scalability of the different algorithms. The data used in the study comes entirely from the Sloan Digital Sky Survey (SDSS) from which 1 million galaxies were used, each having 5-filter (ugriz) images with complete photometry and a spectroscopic redshift which was taken as the ground truth. The mixed-input inception CNN achieved a mean squared error (MSE)=0.009, which was a significant improvement (30%) over the traditional Random Forest (RF), and the model performed even better at lower redshifts achieving a MSE=0.0007 (a 50% improvement over the RF) in the range of z<0.3. This method could be hugely beneficial to upcoming surveys such as the Vera C. Rubin Observatorys Legacy Survey of Space and Time (LSST) which will require vast numbers of photo-z estimates produced as quickly and accurately as possible.
We develop a prescription for estimating the interstellar medium oxygen abundances of distant star-forming galaxies using the ratio EWR_{23} formed from the equivalent widths of the [O II] 3727, [O III] 4959,5007 and Hbeta nebular emission lines. This EWR_{23} approach essentially identical to the widely-used R_{23} method of Pagel et. al (1979). Using data from three spectroscopic surveys of nearby galaxies, we conclude that the emission line equivalent width ratios are a good substitute for emission line flux ratios in galaxies with active star formation. The RMS dispersion between EWR_{23} and the reddening-corrected R_{23} values is sigma(log(R_{23})) < 0.08 dex. This dispersion is comparable to the emission-line measurement uncertainties for distant galaxies in many spectroscopic galaxy surveys, and is smaller than the uncertainties of sigma(O/H) ~ 0.15 dex inherent in strong-line metallicity calibrations. Because equivalent width ratios are, to first order, insentitive to interstellar reddening, emission line equivalent width ratios may actually be superior to flux ratios when reddening corrections are not available. The EWR_{23} method presented here is likely to be most useful for statistically estimating the mean metallicities for large samples of galaxies to trace their chemical properties as a function of redshift or environment. The approach developed here is used in a companion paper (Kobulnicky et. al 2003) to measure the metallicities of star-forming galaxies at z=0.2-0.8 in the Deep Extragalactic Evolutionary Probe spectroscopic survey of the Groth Strip.