No Arabic abstract
Upcoming 21cm surveys will map the spatial distribution of cosmic neutral hydrogen (HI) over unprecedented volumes. Mock catalogues are needed to fully exploit the potential of these surveys. Standard techniques employed to create these mock catalogs, like Halo Occupation Distribution (HOD), rely on assumptions such as the baryonic properties of dark matter halos only depend on their masses. In this work, we use the state-of-the-art magneto-hydrodynamic simulation IllustrisTNG to show that the HI content of halos exhibits a strong dependence on their local environment. We then use machine learning techniques to show that this effect can be 1) modeled by these algorithms and 2) parametrized in the form of novel analytic equations. We provide physical explanations for this environmental effect and show that ignoring it leads to underprediction of the real-space 21-cm power spectrum at $kgtrsim 0.05$ h/Mpc by $gtrsim$10%, which is larger than the expected precision from upcoming surveys on such large scales. Our methodology of combining numerical simulations with machine learning techniques is general, and opens a new direction at modeling and parametrizing the complex physics of assembly bias needed to generate accurate mocks for galaxy and line intensity mapping surveys.
Understanding the impact of halo properties beyond halo mass on the clustering of galaxies (namely galaxy assembly bias) remains a challenge for contemporary models of galaxy clustering. We explore the use of machine learning to predict the halo occupations and recover galaxy clustering and assembly bias in a semi-analytic galaxy formation model. For stellar-mass selected samples, we train a Random Forest algorithm on the number of central and satellite galaxies in each dark matter halo. With the predicted occupations, we create mock galaxy catalogues and measure the clustering and assembly bias. Using a range of halo and environment properties, we find that the machine learning predictions of the occupancy variations with secondary properties, galaxy clustering and assembly bias are all in excellent agreement with those of our target galaxy formation model. Internal halo properties are most important for the central galaxies prediction, while environment plays a critical role for the satellites. Our machine learning models are all provided in a usable format. We demonstrate that machine learning is a powerful tool for modelling the galaxy-halo connection, and can be used to create realistic mock galaxy catalogues which accurately recover the expected occupancy variations, galaxy clustering and galaxy assembly bias, imperative for cosmological analyses of upcoming surveys.
Empirical methods for connecting galaxies to their dark matter halos have become essential for interpreting measurements of the spatial statistics of galaxies. In this work, we present a novel approach for parameterizing the degree of concentration dependence in the abundance matching method. This new parameterization provides a smooth interpolation between two commonly used matching proxies: the peak halo mass and the peak halo maximal circular velocity. This parameterization controls the amount of dependence of galaxy luminosity on halo concentration at a fixed halo mass. Effectively this interpolation scheme enables abundance matching models to have adjustable assembly bias in the resulting galaxy catalogs. With the new 400 Mpc/h DarkSky Simulation, whose larger volume provides lower sample variance, we further show that low-redshift two-point clustering and satellite fraction measurements from SDSS can already provide a joint constraint on this concentration dependence and the scatter within the abundance matching framework.
We explore the use of random forest and gradient boosting, two powerful tree-based machine learning algorithms, for the detection of cosmic strings in maps of the cosmic microwave background (CMB), through their unique Gott-Kaiser-Stebbins effect on the temperature anisotropies.The information in the maps is compressed into feature vectors before being passed to the learning units. The feature vectors contain various statistical measures of processed CMB maps that boost the cosmic string detectability. Our proposed classifiers, after training, give results improved over or similar to the claimed detectability levels of the existing methods for string tension, $Gmu$. They can make $3sigma$ detection of strings with $Gmu gtrsim 2.1times 10^{-10}$ for noise-free, $0.9$-resolution CMB observations. The minimum detectable tension increases to $Gmu gtrsim 3.0times 10^{-8}$ for a more realistic, CMB S4-like (II) strategy, still a significant improvement over the previous results.
Symbolic regression is a powerful technique that can discover analytical equations that describe data, which can lead to explainable models and generalizability outside of the training data set. In contrast, neural networks have achieved amazing levels of accuracy on image recognition and natural language processing tasks, but are often seen as black-box models that are difficult to interpret and typically extrapolate poorly. Here we use a neural network-based architecture for symbolic regression called the Equation Learner (EQL) network and integrate it with other deep learning architectures such that the whole system can be trained end-to-end through backpropagation. To demonstrate the power of such systems, we study their performance on several substantially different tasks. First, we show that the neural network can perform symbolic regression and learn the form of several functions. Next, we present an MNIST arithmetic task where a separate part of the neural network extracts the digits. Finally, we demonstrate prediction of dynamical systems where an unknown parameter is extracted through an encoder. We find that the EQL-based architecture can extrapolate quite well outside of the training data set compared to a standard neural network-based architecture, paving the way for deep learning to be applied in scientific exploration and discovery.
One of the main predictions of excursion set theory is that the clustering of dark matter haloes only depends on halo mass. However, it has been long established that the clustering of haloes also depends on other properties, including formation time, concentration, and spin; this effect is commonly known as halo assembly bias. We use a suite of gravity-only simulations to study the dependence of halo assembly bias on cosmology; these simulations cover cosmological parameters spanning 10$sigma$ around state-of-the-art best-fitting values, including standard extensions of the $Lambda$CDM paradigm such as neutrino mass and dynamical dark energy. We find that the strength of halo assembly bias presents variations smaller than 0.05 dex across all cosmologies studied for concentration and spin selected haloes, letting us conclude that the dependence of halo assembly bias upon cosmology is negligible. We then study the dependence of galaxy assembly bias (i.e. the manifestation of halo assembly bias in galaxy clustering) on cosmology using subhalo abundance matching. We find that galaxy assembly bias also presents very small dependence upon cosmology ($sim$ 2$%$-4$%$ of the total clustering); on the other hand, we find that the dependence of this signal on the galaxy formation parameters of our galaxy model is much stronger. Taken together, these results let us conclude that the dependence of halo and galaxy assembly bias on cosmology is practically negligible.