No Arabic abstract
Precision photometric redshifts will be essential for extracting cosmological parameters from the next generation of wide-area imaging surveys. In this paper we introduce a photometric redshift algorithm, ArborZ, based on the machine-learning technique of Boosted Decision Trees. We study the algorithm using galaxies from the Sloan Digital Sky Survey and from mock catalogs intended to simulate both the SDSS and the upcoming Dark Energy Survey. We show that it improves upon the performance of existing algorithms. Moreover, the method naturally leads to the reconstruction of a full probability density function (PDF) for the photometric redshift of each galaxy, not merely a single best estimate and error, and also provides a photo-z quality figure-of-merit for each galaxy that can be used to reject outliers. We show that the stacked PDFs yield a more accurate reconstruction of the redshift distribution N(z). We discuss limitations of the current algorithm and ideas for future work.
Gradient boosted decision trees (GBDTs) are widely used in machine learning, and the output of current GBDT implementations is a single variable. When there are multiple outputs, GBDT constructs multiple trees corresponding to the output variables. The correlations between variables are ignored by such a strategy causing redundancy of the learned tree structures. In this paper, we propose a general method to learn GBDT for multiple outputs, called GBDT-MO. Each leaf of GBDT-MO constructs predictions of all variables or a subset of automatically selected variables. This is achieved by considering the summation of objective gains over all output variables. Moreover, we extend histogram approximation into multiple output case to speed up the training process. Various experiments on synthetic and real-world datasets verify that GBDT-MO achieves outstanding performance in terms of both accuracy and training speed. Our codes are available on-line.
The CALICE Semi-Digital Hadronic CALorimeter (SDHCAL) prototype using Glass Resistive Plate Chambers as a sensitive medium is the first technological prototype of a family of high-granularity calorimeters developed by the CALICE collaboration to equip the experiments of future leptonic colliders. It was exposed to beams of hadrons, electrons and muons several times in the CERN PS and SPS beamlines between 2012 and 2018. We present here a new method of particle identification within the SDHCAL using the Boosted Decision Trees (BDT) method applied to the data collected in 2015. The performance of the method is tested first with Geant4-based simulated events and then on the data collected by the SDHCAL in the energy range between 10 and 80~GeV with 10~GeV energy steps. The BDT method is then used to reject the electrons and muons that contaminate the SPS hadron beams.
We study the performance of the hybrid template-machine-learning photometric redshift (photo-$z$) algorithm Delight, which uses Gaussian processes, on a subset of the early data release of the Physics of the Accelerating Universe Survey (PAUS). We calibrate the fluxes of the $40$ PAUS narrow bands with $6$ broadband fluxes ($uBVriz$) in the COSMOS field using three different methods, including a new method which utilises the correlation between the apparent size and overall flux of the galaxy. We use a rich set of empirically derived galaxy spectral templates as guides to train the Gaussian process, and we show that our results are competitive with other standard photometric redshift algorithms. Delight achieves a photo-$z$ $68$th percentile error of $sigma_{68}=0.0081(1+z)$ without any quality cut for galaxies with $i_mathrm{auto}<22.5$ as compared to $0.0089(1+z)$ and $0.0202(1+z)$ for the BPz and ANNz2 codes, respectively. Delight is also shown to produce more accurate probability distribution functions for individual redshift estimates than BPz and ANNz2. Common photo-$z$ outliers of Delight and BCNz2 (previously applied to PAUS) are found to be primarily caused by outliers in the narrowband fluxes, with a small number of cases potentially indicating spectroscopic redshift failures in the reference sample. In the process, we introduce performance metrics derived from the results of BCNz2 and Delight, allowing us to achieve a photo-$z$ quality of $sigma_{68}<0.0035(1+z)$ at a magnitude of $i_mathrm{auto}<22.5$ while keeping $50$ per cent objects of the galaxy sample.
The scientific value of the next generation of large continuum surveys would be greatly increased if the redshifts of the newly detected sources could be rapidly and reliably estimated. Given the observational expense of obtaining spectroscopic redshifts for the large number of new detections expected, there has been substantial recent work on using machine learning techniques to obtain photometric redshifts. Here we compare the accuracy of the predicted photometric redshifts obtained from Deep Learning(DL) with the k-Nearest Neighbour (kNN) and the Decision Tree Regression (DTR) algorithms. We find using a combination of near-infrared, visible and ultraviolet magnitudes, trained upon a sample of SDSS QSOs, that the kNN and DL algorithms produce the best self-validation result with a standard deviation of {sigma} = 0.24. Testing on various sub-samples, we find that the DL algorithm generally has lower values of {sigma}, in addition to exhibiting a better performance in other measures. Our DL method, which uses an easy to implement off-the-shelf algorithm with no filtering nor removal of outliers, performs similarly to other, more complex, algorithms, resulting in an accuracy of {Delta}z < 0.1$ up to z ~ 2.5. Applying the DL algorithm trained on our 70,000 strong sample to other independent (radio-selected) datasets, we find {sigma} < 0.36 over a wide range of radio flux densities. This indicates much potential in using this method to determine photometric redshifts of quasars detected with the Square Kilometre Array.
We have revised the SWIRE Photometric Redshift Catalogue to take account of new optical photometry in several of the SWIRE areas, and incorporating 2MASS and UKIDSS near infrared data. Aperture matching is an important issue for combining near infrared and optical data, and we have explored a number of methods of doing this. The increased number of photometric bands available for the redshift solution results in improvements both in the rms error and, especially, in the outlier rate. We have also found that incorporating the dust torus emission into the QSO templates improves the performance for QSO redshift estimation. Our revised redshift catalogue contains over 1 million extragalactic objects, of which 26288 are QSOs.