No Arabic abstract
Photometric redshift estimation is an indispensable tool of precision cosmology. One problem that plagues the use of this tool in the era of large-scale sky surveys is that the bright galaxies that are selected for spectroscopic observation do not have properties that match those of (far more numerous) dimmer galaxies; thus, ill-designed empirical methods that produce accurate and precise redshift estimates for the former generally will not produce good estimates for the latter. In this paper, we provide a principled framework for generating conditional density estimates (i.e. photometric redshift PDFs) that takes into account selection bias and the covariate shift that this bias induces. We base our approach on the assumption that the probability that astronomers label a galaxy (i.e. determine its spectroscopic redshift) depends only on its measured (photometric and perhaps other) properties x and not on its true redshift. With this assumption, we can explicitly write down risk functions that allow us to both tune and compare methods for estimating importance weights (i.e. the ratio of densities of unlabeled and labeled galaxies for different values of x) and conditional densities. We also provide a method for combining multiple conditional density estimates for the same galaxy into a single estimate with better properties. We apply our risk functions to an analysis of approximately one million galaxies, mostly observed by SDSS, and demonstrate through multiple diagnostic tests that our method achieves good conditional density estimates for the unlabeled galaxies.
Accurate photometric redshifts are a lynchpin for many future experiments to pin down the cosmological model and for studies of galaxy evolution. In this study, a novel sparse regression framework for photometric redshift estimation is presented. Simulated and real data from SDSS DR12 were used to train and test the proposed models. We show that approaches which include careful data preparation and model design offer a significant improvement in comparison with several competing machine learning algorithms. Standard implementations of most regression algorithms have as the objective the minimization of the sum of squared errors. For redshift inference, however, this induces a bias in the posterior mean of the output distribution, which can be problematic. In this paper we directly target minimizing $Delta z = (z_textrm{s} - z_textrm{p})/(1+z_textrm{s})$ and address the bias problem via a distribution-based weighting scheme, incorporated as part of the optimization objective. The results are compared with other machine learning algorithms in the field such as Artificial Neural Networks (ANN), Gaussian Processes (GPs) and sparse GPs. The proposed framework reaches a mean absolute $Delta z = 0.0026(1+z_textrm{s})$, over the redshift range of $0 le z_textrm{s} le 2$ on the simulated data, and $Delta z = 0.0178(1+z_textrm{s})$ over the entire redshift range on the SDSS DR12 survey, outperforming the standard ANNz used in the literature. We also investigate how the relative size of the training set affects the photometric redshift accuracy. We find that a training set of textgreater 30 per cent of total sample size, provides little additional constraint on the photometric redshifts, and note that our GP formalism strongly outperforms ANNz in the sparse data regime for the simulated data set.
We present a rigorous mathematical solution to photometric redshift estimation and the more general inversion problem. The challenge we address is to meaningfully constrain unknown properties of astronomical sources based on given observables, usually multicolor photometry, with the help of a training set that provides an empirical relation between the measurements and the desired quantities. We establish a formalism that blurs the boundary between the traditional empirical and template-fitting algorithms, as both are just special cases that are discussed in detail to put them in context. The new approach enables the development of more sophisticated methods that go beyond the classic techniques to combine their advantages. We look at the directions for further improvement in the methodology, and examine the technical aspects of practical implementations. We show how training sets are to be constructed and used consistently for reliable estimation.
Calibration precision is currently a limiting systematic in 21 cm cosmology experiments. While there are innumerable calibration approaches, most can be categorized as either `sky-based, relying on an extremely accurate model of astronomical foreground emission, or `redundant, requiring a precisely regular array with near-identical antenna response patterns. Both of these classes of calibration are inflexible to the realities of interferometric measurement. In practice, errors in the foreground model, antenna position offsets, and beam response inhomogeneities degrade calibration performance and contaminate the cosmological signal. Here we show that sky-based and redundant calibration can be unified into a highly general and physically motivated calibration framework based on a Bayesian statistical formalism. Our new framework includes sky and redundant calibration as special cases but can additionally support relaxing the rigid assumptions implicit in those approaches. Furthermore, we present novel calibration techniques such as redundant calibration for arrays with no redundant baselines, representing an alternative calibration method for imaging arrays such as the MWA Phase I. These new calibration approaches could mitigate systematics and reduce calibration error, thereby improving the precision of cosmological measurements.
Future radio surveys will generate catalogues of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshift than most optical sources and so a significant fraction of radio sources hosts differ from those for which most photometric redshift templates are designed. We therefore need to develop new techniques for estimating the redshifts of radio sources. As a starting point in this process, we evaluate a number of machine-learning techniques for estimating redshift, together with a conventional template-fitting technique. We pay special attention to how the performance is affected by the incompleteness of the training sample and by sparseness of the parameter space or by limited availability of ancillary multi-wavelength data. As expected, we find that the quality of the photometric-redshift degrades as the quality of the photometry decreases, but that even with the limited quality of photometry available for all sky-surveys, useful redshift information is available for the majority of sources, particularly at low redshift. We find that a template-fitting technique performs best with high-quality and almost complete multi-band photometry, especially if radio sources that are also X-ray emitting are treated separately. When we reduced the quality of photometry to match that available for the EMU all-sky radio survey, the quality of the template-fitting degraded and became comparable to some of the machine learning methods. Machine learning techniques currently perform better at low redshift than at high redshift, because of incompleteness of the currently available training data at high redshifts.
We show that mid-infrared data from the all-sky WISE survey can be used as a robust photometric redshift indicator for powerful radio AGN, in the absence of other spectroscopic or multi-band photometric information. Our work is motivated by a desire to extend the well-known K-z relation for radio galaxies to the wavelength range covered by the all-sky WISE mid-infrared survey. Using the LARGESS radio spectroscopic sample as a training set, and the mid-infrared colour information to classify radio sources, we generate a set of redshift probability distributions for the hosts of high-excitation and low-excitation radio AGN. We test the method using spectroscopic data from several other radio AGN studies, and find good agreement between our WISE-based redshift estimates and published spectroscopic redshifts out to z ~ 1 for galaxies and z ~ 3-4 for radio-loud QSOs. Our chosen method is also compared against other classification methods and found to perform reliably. This technique is likely to be particularly useful in the analysis of upcoming large-area radio surveys with SKA pathfinder telescopes, and our code is publicly available. As a consistency check, we show that our WISE-based redshift estimates for sources in the 843 MHz SUMSS survey reproduce the redshift distribution seen in the CENSORS study up to z ~ 2. We also discuss two specific applications of our technique for current and upcoming radio surveys; an interpretation of large scale HI absorption surveys, and a determination of whether low-frequency peaked spectrum sources lie at high redshift.