No Arabic abstract
The covariance matrix $boldsymbol{Sigma}$ of non-linear clustering statistics that are measured in current and upcoming surveys is of fundamental interest for comparing cosmological theory and data and a crucial ingredient for the likelihood approximations underlying widely used parameter inference and forecasting methods. The extreme number of simulations needed to estimate $boldsymbol{Sigma}$ to sufficient accuracy poses a severe challenge. Approximating $boldsymbol{Sigma}$ using inexpensive but biased surrogates introduces model error with respect to full simulations, especially in the non-linear regime of structure growth. To address this problem we develop a matrix generalization of Convergence Acceleration by Regression and Pooling (CARPool) to combine a small number of simulations with fast surrogates and obtain low-noise estimates of $boldsymbol{Sigma}$ that are unbiased by construction. Our numerical examples use CARPool to combine GADGET-III $N$-body simulations with fast surrogates computed using COmoving Lagrangian Acceleration (COLA). Even at the challenging redshift $z=0.5$, we find variance reductions of at least $mathcal{O}(10^1)$ and up to $mathcal{O}(10^4)$ for the elements of the matter power spectrum covariance matrix on scales $8.9times 10^{-3}<k_mathrm{max} <1.0$ $h {rm Mpc^{-1}}$. We demonstrate comparable performance for the covariance of the matter bispectrum, the matter correlation function and probability density function of the matter density field. We compare eigenvalues, likelihoods, and Fisher matrices computed using the CARPool covariance estimate with the standard sample covariance estimators and generally find considerable improvement except in cases where $Sigma$ is severely ill-conditioned.
We present a numerically cheap approximation to super-sample covariance (SSC) of large scale structure cosmological probes, first in the case of angular power spectra. It necessitates no new elements besides those used for the prediction of the considered probes, thus relieving analysis pipelines from having to develop a full SSC modeling, and reducing the computational load. The approximation is asymptotically exact for fine redshift bins $Delta z rightarrow 0$. We furthermore show how it can be implemented at the level of a Gaussian likelihood or a Fisher matrix forecast, as a fast correction to the Gaussian case without needing to build large covariance matrices. Numerical application to a Euclid-like survey show that, compared to a full SSC computation, the approximation recovers nicely the signal-to-noise ratio as well as Fisher forecasts on cosmological parameters of the $w$CDM cosmological model. Moreover it allows for a fast prediction of which parameters are going to be the most affected by SSC and at which level. In the case of photometric galaxy clustering with Euclid-like specifications, we find that $sigma_8$, $n_s$ and the dark energy equation of state $w$ are particularly heavily affected. We finally show how to generalize the approximation for probes other than angular spectra (correlation functions, number counts and bispectra), and at the likelihood level, allowing for the latter to be non-Gaussian if needs be. We release publicly a Python module allowing to implement the SSC approximation, as well as a notebook reproducing the plots of the article, at https://github.com/fabienlacasa/PySSC
To exploit the power of next-generation large-scale structure surveys, ensembles of numerical simulations are necessary to give accurate theoretical predictions of the statistics of observables. High-fidelity simulations come at a towering computational cost. Therefore, approximate but fast simulations, surrogates, are widely used to gain speed at the price of introducing model error. We propose a general method that exploits the correlation between simulations and surrogates to compute fast, reduced-variance statistics of large-scale structure observables without model error at the cost of only a few simulations. We call this approach Convergence Acceleration by Regression and Pooling (CARPool). In numerical experiments with intentionally minimal tuning, we apply CARPool to a handful of GADGET-III $N$-body simulations paired with surrogates computed using COmoving Lagrangian Acceleration (COLA). We find $sim 100$-fold variance reduction even in the non-linear regime, up to $k_mathrm{max} approx 1.2$ $h {rm Mpc^{-1}}$ for the matter power spectrum. CARPool realises similar improvements for the matter bispectrum. In the nearly linear regime CARPool attains far larger sample variance reductions. By comparing to the 15,000 simulations from the Quijote suite, we verify that the CARPool estimates are unbiased, as guaranteed by construction, even though the surrogate misses the simulation truth by up to $60%$ at high $k$. Furthermore, even with a fully configuration-space statistic like the non-linear matter density probability density function, CARPool achieves unbiased variance reduction factors of up to $sim 10$, without any further tuning. Conversely, CARPool can be used to remove model error from ensembles of fast surrogates by combining them with a few high-accuracy simulations.
We give an analytical interpretation of how subsample-based internal covariance estimators lead to biased estimates of the covariance, due to underestimating the super-sample covariance (SSC). This includes the jackknife and bootstrap methods as estimators for the full survey area, and subsampling as an estimator of the covariance of subsamples. The limitations of the jackknife covariance have been previously presented in the literature because it is effectively a rescaling of the covariance of the subsample area. However we point out that subsampling is also biased, but for a different reason: the subsamples are not independent, and the corresponding lack of power results in SSC underprediction. We develop the formalism in the case of cluster counts that allows the bias of each covariance estimator to be exactly predicted. We find significant effects for a small-scale area or when a low number of subsamples is used, with auto-redshift biases ranging from 0.4% to 15% for subsampling and from 5% to 75% for jackknife covariance estimates. The cross-redshift covariance is even more affected; biases range from 8% to 25% for subsampling and from 50% to 90% for jackknife. Owing to the redshift evolution of the probe, the covariances cannot be debiased by a simple rescaling factor, and an exact debiasing has the same requirements as the full SSC prediction. These results thus disfavour the use of internal covariance estimators on data itself or a single simulation, leaving analytical prediction and simulations suites as possible SSC predictors.
Upcoming weak lensing surveys will probe large fractions of the sky with unprecedented accuracy. To infer cosmological constraints, a large ensemble of survey simulations are required to accurately model cosmological observables and their covariances. We develop a parallelized multi-lens-plane pipeline called UFalcon, designed to generate full-sky weak lensing maps from lightcones within a minimal runtime. It makes use of L-PICOLA, an approximate numerical code, which provides a fast and accurate alternative to cosmological $N$-Body simulations. The UFalcon maps are constructed by nesting 2 simulations covering a redshift-range from $z=0.1$ to $1.5$ without replicating the simulation volume. We compute the convergence and projected overdensity maps for L-PICOLA in the lightcone or snapshot mode. The generation of such a map, including the L-PICOLA simulation, takes about 3 hours walltime on 220 cores. We use the maps to calculate the spherical harmonic power spectra, which we compare to theoretical predictions and to UFalcon results generated using the full $N$-Body code GADGET-2. We then compute the covariance matrix of the full-sky spherical harmonic power spectra using 150 UFalcon maps based on L-PICOLA in lightcone mode. We consider the PDF, the higher-order moments and the variance of the smoothed field variance to quantify the accuracy of the covariance matrix, which we find to be a few percent for scales $ell sim 10^2$ to $10^3$. We test the impact of this level of accuracy on cosmological constraints using an optimistic survey configuration, and find that the final results are robust to this level of uncertainty. The speed and accuracy of our developed pipeline provides a basis to also include further important features such as masking, varying noise and will allow us to compute covariance matrices for models beyond $Lambda$CDM. [abridged]
During the last few decades, online controlled experiments (also known as A/B tests) have been adopted as a golden standard for measuring business improvements in industry. In our company, there are more than a billion users participating in thousands of experiments simultaneously, and with statistical inference and estimations conducted to thousands of online metrics in those experiments routinely, computational costs would become a large concern. In this paper we propose a novel algorithm for estimating the covariance of online metrics, which introduces more flexibility to the trade-off between computational costs and precision in covariance estimation. This covariance estimation method reduces computational cost of metric calculation in large-scale setting, which facilitates further application in both online controlled experiments and adaptive experiments scenarios like variance reduction, continuous monitoring, Bayesian optimization, etc., and it can be easily implemented in engineering practice.