No Arabic abstract
All estimators of the two-point correlation function are based on a random catalogue, a set of points with no intrinsic clustering following the selection function of a survey. High-accuracy estimates require the use of large random catalogues, which imply a high computational cost. We propose to replace the standard random catalogues by glass-like point distributions or glass catalogues, which are characterized by a power spectrum $P(k)propto k^4$ and exhibit significantly less power than a Poisson distribution with the same number of points on scales larger than the mean inter-particle separation. We show that these distributions can be obtained by iteratively applying the technique of Zeldovich reconstruction commonly used in studies of baryon acoustic oscillations (BAO). We provide a modified version of the widely used Landy-Szalay estimator of the correlation function adapted to the use of glass catalogues and compare its performance with the results obtained using random samples. Our results show that glass-like samples do not add any bias with respect to the results obtained using Poisson distributions. On scales larger than the mean inter-particle separation of the glass catalogues, the modified estimator leads to a significant reduction of the variance of the Legendre multipoles $xi_ell(s)$ with respect to the standard Landy-Szalay results with the same number of points. The size of the glass catalogue required to achieve a given accuracy in the correlation function is significantly smaller than when using random samples. Even considering the small additional cost of constructing the glass catalogues, their use could help to drastically reduce the computational cost of configuration-space clustering analysis of future surveys while maintaining high-accuracy requirements.
The two-point correlation function of the galaxy distribution is a key cosmological observable that allows us to constrain the dynamical and geometrical state of our Universe. To measure the correlation function we need to know both the galaxy positions and the expected galaxy density field. The expected field is commonly specified using a Monte-Carlo sampling of the volume covered by the survey and, to minimize additional sampling errors, this random catalog has to be much larger than the data catalog. Correlation function estimators compare data-data pair counts to data-random and random-random pair counts, where random-random pairs usually dominate the computational cost. Future redshift surveys will deliver spectroscopic catalogs of tens of millions of galaxies. Given the large number of random objects required to guarantee sub-percent accuracy, it is of paramount importance to improve the efficiency of the algorithm without degrading its precision. We show both analytically and numerically that splitting the random catalog into a number of subcatalogs of the same size as the data catalog when calculating random-random pairs, and excluding pairs across different subcatalogs provides the optimal error at fixed computational cost. For a random catalog fifty times larger than the data catalog, this reduces the computation time by a factor of more than ten without affecting estimator variance or bias.
We present a new method to estimate redshift distributions and galaxy-dark matter bias parameters using correlation functions in a fully data driven and self-consistent manner. Unlike other machine learning, template, or correlation redshift methods, this approach does not require a reference sample with known redshifts. By measuring the projected cross- and auto- correlations of different galaxy sub-samples, e.g., as chosen by simple cells in color-magnitude space, we are able to estimate the galaxy-dark matter bias model parameters, and the shape of the redshift distributions of each sub-sample. This method fully marginalises over a flexible parameterisation of the redshift distribution and galaxy-dark matter bias parameters of sub-samples of galaxies, and thus provides a general Bayesian framework to incorporate redshift uncertainty into the cosmological analysis in a data-driven, consistent, and reproducible manner. This result is improved by an order of magnitude by including cross-correlations with the CMB and with galaxy-galaxy lensing. We showcase how this method could be applied to real galaxies. By using idealised data vectors, in which all galaxy-dark matter model parameters and redshift distributions are known, this method is demonstrated to recover unbiased estimates on important quantities, such as the offset $Delta_z$ between the mean of the true and estimated redshift distribution and the 68% and 95% and 99.5% widths of the redshift distribution to an accuracy required by current and future surveys.
We present correction terms that allow delete-one Jackknife and Bootstrap methods to be used to recover unbiased estimates of the data covariance matrix of the two-point correlation function $xileft(mathbf{r}right)$. We demonstrate the accuracy and precision of this new method using a large set of 1000 QUIJOTE simulations that each cover a comoving volume of $1rm{left[h^{-1}Gpcright]^3}$. The corrected resampling techniques accurately recover the correct amplitude and structure of the data covariance matrix as represented by its principal components. Our corrections for the internal resampling methods are shown to be robust against the intrinsic clustering of the cosmological tracers both in real- and redshift space using two snapshots at $z=0$ and $z=1$ that mimic two samples with significantly different clustering. We also analyse two different slicing of the simulation volume into $n_{rm sv}=64$ or $125$ sub-samples and show that the main impact of different $n_{rm sv}$ is on the structure of the covariance matrix due to the limited number of independent internal realisations that can be made given a fixed $n_{rm sv}$.
We perform theoretical and numerical studies of the full relativistic two-point galaxy correlation function, considering the linear-order scalar and tensor perturbation contributions and the wide-angle effects. Using the gauge-invariant relativistic description of galaxy clustering and accounting for the contributions at the observer position, we demonstrate that the complete theoretical expression is devoid of any long-mode contributions from scalar or tensor perturbations and it lacks the infrared divergences in agreement with the equivalence principle. By showing that the gravitational potential contribution to the correlation function converges in the infrared, our study justifies an IR cut-off $(k_{text{IR}} leq H_0)$ in computing the gravitational potential contribution. Using the full gauge-invariant expression, we numerically compute the galaxy two-point correlation function and study the individual contributions in the conformal Newtonian gauge. We find that the terms at the observer position such as the coordinate lapses and the observer velocity (missing in the standard formalism) dominate over the other relativistic contributions in the conformal Newtonian gauge such as the source velocity, the gravitational potential, the integrated Sachs-Wolf effect, the Shapiro time-delay and the lensing convergence. Compared to the standard Newtonian theoretical predictions that consider only the density fluctuation and redshift-space distortions, the relativistic effects in galaxy clustering result in a few percent-level systematic errors beyond the scale of the baryonic acoustic oscillation. Our theoretical and numerical study provides a comprehensive understanding of the relativistic effects in the galaxy two-point correlation function, as it proves the validity of the theoretical prediction and accounts for effects that are often neglected in its numerical evaluation.
We present an $8.1sigma$ detection of the non-Gaussian 4-Point Correlation Function (4PCF) using a sample of $N_{rm g} approx 8times 10^5$ galaxies from the BOSS CMASS dataset. Our measurement uses the $mathcal{O}(N_{rm g}^2)$ NPCF estimator of Philcox et al. (2021), including a new modification to subtract the disconnected 4PCF contribution (arising from the product of two 2PCFs) at the estimator level. This approach is unlike previous work and ensures that our signal is a robust detection of gravitationally-induced non-Gaussianity. The estimator is validated with a suite of lognormal simulations, and the analytic form of the disconnected contribution is discussed. Due to the high dimensionality of the 4PCF, data compression is required; we use a signal-to-noise-based scheme calibrated from theoretical covariance matrices to restrict to $sim$ $100$ basis vectors. The compression has minimal impact on the detection significance and facilitates traditional $chi^2$-like analyses using a suite of mock catalogs. The significance is stable with respect to different treatments of noise in the sample covariance (arising from the limited number of mocks), but decreases to $4.7sigma$ when a minimum galaxy separation of $14 h^{-1}mathrm{Mpc}$ is enforced on the 4PCF tetrahedra (such that the statistic can be modelled more easily). The detectability of the 4PCF in the quasi-linear regime implies that it will become a useful tool in constraining cosmological and galaxy formation parameters from upcoming spectroscopic surveys.