No Arabic abstract
We introduce a new method to determine galaxy cluster membership based solely on photometric properties. We adopt a machine learning approach to recover a cluster membership probability from galaxy photometric parameters and finally derive a membership classification. After testing several machine learning techniques (such as Stochastic Gradient Boosting, Model Averaged Neural Network and k-Nearest Neighbors), we found the Support Vector Machine (SVM) algorithm to perform better when applied to our data. Our training and validation data are from the Sloan Digital Sky Survey (SDSS) main sample. Hence, to be complete to $M_r^* + 3$ we limit our work to 30 clusters with $z_{text{phot-cl}} le 0.045$. Masses ($M_{200}$) are larger than $sim 0.6times10^{14} M_{odot}$ (most above $3times10^{14} M_{odot}$). Our results are derived taking in account all galaxies in the line of sight of each cluster, with no photometric redshift cuts or background corrections. Our method is non-parametric, making no assumptions on the number density or luminosity profiles of galaxies in clusters. Our approach delivers extremely accurate results (completeness, C $sim 92%$ and purity, P $sim 87%$) within R$_{200}$, so that we named our code {bf RPM}. We discuss possible dependencies on magnitude, colour and cluster mass. Finally, we present some applications of our method, stressing its impact to galaxy evolution and cosmological studies based on future large scale surveys, such as eROSITA, EUCLID and LSST.
A fundamental challenge for wide-field imaging surveys is obtaining follow-up spectroscopic observations: there are > $10^9$ photometrically cataloged sources, yet modern spectroscopic surveys are limited to ~few x $10^6$ targets. As we approach the Large Synoptic Survey Telescope (LSST) era, new algorithmic solutions are required to cope with the data deluge. Here we report the development of a machine-learning framework capable of inferring fundamental stellar parameters (Teff, log g, and [Fe/H]) using photometric-brightness variations and color alone. A training set is constructed from a systematic spectroscopic survey of variables with Hectospec/MMT. In sum, the training set includes ~9000 spectra, for which stellar parameters are measured using the SEGUE Stellar Parameters Pipeline (SSPP). We employed the random forest algorithm to perform a non-parametric regression that predicts Teff, log g, and [Fe/H] from photometric time-domain observations. Our final, optimized model produces a cross-validated root-mean-square error (RMSE) of 165 K, 0.39 dex, and 0.33 dex for Teff, log g, and [Fe/H], respectively. Examining the subset of sources for which the SSPP measurements are most reliable, the RMSE reduces to 125 K, 0.37 dex, and 0.27 dex, respectively, comparable to what is achievable via low-resolution spectroscopy. For variable stars this represents a ~12-20% improvement in RMSE relative to models trained with single-epoch photometric colors. As an application of our method, we estimate stellar parameters for ~54,000 known variables. We argue that this method may convert photometric time-domain surveys into pseudo-spectrographic engines, enabling the construction of extremely detailed maps of the Milky Way, its structure, and history.
We test how well available stellar population models can reproduce observed u,g,r,i,z-band photometry of the local galaxy population (0.02<=z<=0.03) as probed by the SDSS. Our study is conducted from the perspective of a user of the models, who has observational data in hand and seeks to convert them into physical quantities. Stellar population models for galaxies are created by synthesizing star formations histories and chemical enrichments using single stellar populations from several groups (Starburst99, GALAXEV, Maraston2005, GALEV). The role of dust is addressed through a simplistic, but observationally motivated, dust model that couples the amplitude of the extinction to the star formation history, metallicity and the viewing angle. Moreover, the influence of emission lines is considered (for the subset of models for which this component is included). The performance of the models is investigated by: 1) comparing their prediction with the observed galaxy population in the SDSS using the (u-g)-(r-i) and (g-r)-(i-z) color planes, 2) comparing predicted stellar mass and luminosity weighted ages and metallicities, specific star formation rates, mass to light ratios and total extinctions with literature values from studies based on spectroscopy. Strong differences between the various models are seen, with several models occupying regions in the color-color diagrams where no galaxies are observed. We would therefore like to emphasize the importance of the choice of model. Using our preferred model we find that the star formation history, metallicity and also dust content can be constrained over a large part of the parameter space through the use of u,g,r,i,z-band photometry. However, strong local degeneracies are present due to overlap of models with high and low extinction in certain parts of color space.
We infer the UV luminosities of Local Group galaxies at early cosmic times ($z sim 2$ and $z sim 7$) by combining stellar population synthesis modeling with star formation histories derived from deep color-magnitude diagrams constructed from Hubble Space Telescope (HST) observations. Our analysis provides a basis for understanding high-$z$ galaxies - including those that may be unobservable even with the James Webb Space Telescope (JWST) - in the context of familiar, well-studied objects in the very low-$z$ Universe. We find that, at the epoch of reionization, all Local Group dwarfs were less luminous than the faintest galaxies detectable in deep HST observations of blank fields. We predict that JWST will observe $z sim 7$ progenitors of galaxies similar to the Large Magellanic Cloud today; however, the HST Frontier Fields initiative may already be observing such galaxies, highlighting the power of gravitational lensing. Consensus reionization models require an extrapolation of the observed blank-field luminosity function at $z approx 7$ by at least two orders of magnitude in order to maintain reionization. This scenario requires the progenitors of the Fornax and Sagittarius dwarf spheroidal galaxies to be contributors to the ionizing background at $z sim 7$. Combined with numerical simulations, our results argue for a break in the UV luminosity function from a faint-end slope of $alpha sim -2$ at $M_{rm UV} < -13$ to $alpha sim -1.2$ at lower luminosities. Applied to photometric samples at lower redshifts, our analysis suggests that HST observations in lensing fields at $z sim 2$ are capable of probing galaxies with luminosities comparable to the expected progenitor of Fornax.
The efficient classification of different types of supernova is one of the most important problems for observational cosmology. However, spectroscopic confirmation of most objects in upcoming photometric surveys, such as the The Rubin Observatory Legacy Survey of Space and Time (LSST), will be unfeasible. The development of automated classification processes based on photometry has thus become crucial. In this paper we investigate the performance of machine learning (ML) classification on the final cosmological constraints using simulated lightcurves from The Supernova Photometric Classification Challenge, released in 2010. We study the use of different feature sets for the lightcurves and many different ML pipelines based on either decision tree ensembles or automated search processes. To construct the final catalogs we propose a threshold selection method, by employing a emph{Bias-Variance tradeoff}. This is a very robust and efficient way to minimize the Mean Squared Error. With this method we were able to get very strong cosmological constraints, which allowed us to keep $sim 75%$ of the total information in the type Ia SNe when using the SALT2 feature set and $sim 33%$ for the other cases (based on either the Newling model or on standard wavelet decomposition).
We present a new shear calibration method based on machine learning. The method estimates the individual shear responses of the objects from the combination of several measured properties on the images using supervised learning. The supervised learning uses the true individual shear responses obtained from copies of the image simulations with different shear values. On simulated GREAT3data, we obtain a residual bias after the calibration compatible with 0 and beyond Euclid requirements for a signal-to-noise ratio > 20 within ~15 CPU hours of training using only ~10^5 objects. This efficient machine-learning approach can use a smaller data set because the method avoids the contribution from shape noise. The low dimensionality of the input data also leads to simple neural network architectures. We compare it to the recently described method Metacalibration, which shows similar performances. The different methods and systematics suggest that the two methods are very good complementary methods. Our method can therefore be applied without much effort to any survey such as Euclid or the Vera C. Rubin Observatory, with fewer than a million images to simulate to learn the calibration function.