No Arabic abstract
We investigate the application of Hybrid Effective Field Theory (HEFT) -- which combines a Lagrangian bias expansion with subsequent particle dynamics from $N$-body simulations -- to the modeling of $k$-Nearest Neighbor Cumulative Distribution Functions ($k{rm NN}$-${rm CDF}$s) of biased tracers of the cosmological matter field. The $k{rm NN}$-${rm CDF}$s are sensitive to all higher order connected $N$-point functions in the data, but are computationally cheap to compute. We develop the formalism to predict the $k{rm NN}$-${rm CDF}$s of discrete tracers of a continuous field from the statistics of the continuous field itself. Using this formalism, we demonstrate how $k{rm NN}$-${rm CDF}$ statistics of a set of biased tracers, such as halos or galaxies, of the cosmological matter field can be modeled given a set of low-redshift HEFT component fields and bias parameter values. These are the same ingredients needed to predict the two-point clustering. For a specific sample of halos, we show that both the two-point clustering textit{and} the $k{rm NN}$-${rm CDF}$s can be well-fit on quasi-linear scales ($gtrsim 20 h^{-1}{rm Mpc}$) by the second-order HEFT formalism with the textit{same values} of the bias parameters, implying that joint modeling of the two is possible. Finally, using a Fisher matrix analysis, we show that including $k{rm NN}$-${rm CDF}$ measurements over the range of allowed scales in the HEFT framework can improve the constraints on $sigma_8$ by roughly a factor of $3$, compared to the case where only two-point measurements are considered. Combining the statistical power of $k{rm NN}$ measurements with the modeling power of HEFT, therefore, represents an exciting prospect for extracting greater information from small-scale cosmological clustering.
In this paper we test the perturbative halo bias model at the field level. The advantage of this approach is that any analysis can be done without sample variance if the same initial conditions are used in simulations and perturbation theory calculations. We write the bias expansion in terms of modified bias operators in Eulerian space, designed such that the large bulk flows are automatically resummed and not treated perturbatively. Using these operators, the bias model accurately matches the Eulerian density of halos in N-body simulations. The mean-square model error is close to the Poisson shot noise for a wide range of halo masses and it is rather scale-independent, with scale-dependent corrections becoming relevant at the nonlinear scale. In contrast, for linear bias the mean-square model error can be higher than the Poisson prediction by factors of up to a few on large scales, and it becomes scale dependent already in the linear regime. We show that by weighting simulated halos by their mass, the mean-square error of the model can be further reduced by up to an order of magnitude, or by a factor of two when including $60%$ mass scatter. We also test the Standard Eulerian bias model using the nonlinear matter field measured from simulations and show that it leads to a larger and more scale-dependent model error than the bias expansion based on perturbation theory. These results may be of particular relevance for cosmological inference methods that use a likelihood of the biased tracer at the field level, or for initial condition and BAO reconstruction that requires a precise estimate of the large-scale potential from the biased tracer density.
With the completion of the Planck mission, in order to continue to gather cosmological information it has become crucial to understand the Large Scale Structures (LSS) of the universe to percent accuracy. The Effective Field Theory of LSS (EFTofLSS) is a novel theoretical framework that aims to develop an analytic understanding of LSS at long distances, where inhomogeneities are small. We further develop the description of biased tracers in the EFTofLSS to account for the effect of baryonic physics and primordial non-Gaussianities, finding that new bias coefficients are required. Then, restricting to dark matter with Gaussian initial conditions, we describe the prediction of the EFTofLSS for the one-loop halo-halo and halo-matter two-point functions, and for the tree-level halo-halo-halo, matter-halo-halo and matter-matter-halo three-point functions. Several new bias coefficients are needed in the EFTofLSS, even though their contribution at a given order can be degenerate and the same parameters contribute to multiple observables. We develop a method to reduce the number of biases to an irreducible basis, and find that, at the order at which we work, seven bias parameters are enough to describe this extremely rich set of statistics. We then compare with the output of $N$-body simulations. For the lowest mass bin, we find percent level agreement up to $ksimeq 0.3,h,{rm Mpc}^{-1}$ for the one-loop two-point functions, and up to $ksimeq 0.15,h,{rm Mpc}^{-1}$ for the tree-level three-point functions, with the $k$-reach decreasing with higher mass bins. This is consistent with the theoretical estimates, and suggests that the cosmological information in LSS amenable to analytical control is much more than previously believed.
Cross-correlations between datasets are used in many different contexts in cosmological analyses. Recently, $k$-Nearest Neighbor Cumulative Distribution Functions ($k{rm NN}$-${rm CDF}$) were shown to be sensitive probes of cosmological (auto) clustering. In this paper, we extend the framework of nearest neighbor measurements to describe joint distributions of, and correlations between, two datasets. We describe the measurement of joint $k{rm NN}$-${rm CDF}$s, and show that these measurements are sensitive to all possible connected $N$-point functions that can be defined in terms of the two datasets. We describe how the cross-correlations can be isolated by combining measurements of the joint $k{rm NN}$-${rm CDF}$s and those measured from individual datasets. We demonstrate the application of these measurements in the context of Gaussian density fields, as well as for fully nonlinear cosmological datasets. Using a Fisher analysis, we show that measurements of the halo-matter cross-correlations, as measured through nearest neighbor measurements are more sensitive to the underlying cosmological parameters, compared to traditional two-point cross-correlation measurements over the same range of scales. Finally, we demonstrate how the nearest neighbor cross-correlations can robustly detect cross correlations between sparse samples -- the same regime where the two-point cross-correlation measurements are dominated by noise.
The use of summary statistics beyond the two-point correlation function to analyze the non-Gaussian clustering on small scales is an active field of research in cosmology. In this paper, we explore a set of new summary statistics -- the $k$-Nearest Neighbor Cumulative Distribution Functions ($k{rm NN}$-${rm CDF}$). This is the empirical cumulative distribution function of distances from a set of volume-filling, Poisson distributed random points to the $k$-nearest data points, and is sensitive to all connected $N$-point correlations in the data. The $k{rm NN}$-${rm CDF}$ can be used to measure counts in cell, void probability distributions and higher $N$-point correlation functions, all using the same formalism exploiting fast searches with spatial tree data structures. We demonstrate how it can be computed efficiently from various data sets - both discrete points, and the generalization for continuous fields. We use data from a large suite of $N$-body simulations to explore the sensitivity of this new statistic to various cosmological parameters, compared to the two-point correlation function, while using the same range of scales. We demonstrate that the use of $k{rm NN}$-${rm CDF}$ improves the constraints on the cosmological parameters by more than a factor of $2$ when applied to the clustering of dark matter in the range of scales between $10h^{-1}{rm Mpc}$ and $40h^{-1}{rm Mpc}$. We also show that relative improvement is even greater when applied on the same scales to the clustering of halos in the simulations at a fixed number density, both in real space, as well as in redshift space. Since the $k{rm NN}$-${rm CDF}$ are sensitive to all higher order connected correlation functions in the data, the gains over traditional two-point analyses are expected to grow as progressively smaller scales are included in the analysis of cosmological data.
We use the $k$-nearest neighbor probability distribution function ($k$NN-PDF, Banerjee & Abel 2021) to assess convergence in a scale-free $N$-body simulation. Compared to our previous two-point analysis, the $k$NN-PDF allows us to quantify our results in the language of halos and numbers of particles, while also incorporating non-Gaussian information. We find good convergence for 32 particles and greater at densities typical of halos, while 16 particles and fewer appears unconverged. Halving the softening length extends convergence to higher densities, but not to fewer particles. Our analysis is less sensitive to voids, but we analyze a limited range of underdensities and find evidence for convergence at 16 particles and greater even in sparse voids.