Lossless, Scalable Implicit Likelihood Inference for Cosmological Fields

66 0 0.0 ( 0 )

Download Cite

Added by T. Lucas Makinen

Publication date 2021

fields Physics

and research's language is English

Authors T. Lucas Makinen - Tom Charnock - Justin Alsing

Cosmology and Nongalactic Astrophysics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a comparison of simulation-based inference to full, field-based analytical inference in cosmological data analysis. To do so, we explore parameter inference for two cases where the information content is calculable analytically: Gaussian random fields whose covariance depends on parameters through the power spectrum; and correlated lognormal fields with cosmological power spectra. We compare two inference techniques: i) explicit field-level inference using the known likelihood and ii) implicit likelihood inference with maximally informative summary statistics compressed via Information Maximising Neural Networks (IMNNs). We find that a) summaries obtained from convolutional neural network compression do not lose information and therefore saturate the known field information content, both for the Gaussian covariance and the lognormal cases, b) simulation-based inference using these maximally informative nonlinear summaries recovers nearly losslessly the exact posteriors of field-level inference, bypassing the need to evaluate expensive likelihoods or invert covariance matrices, and c) even for this simple example, implicit, simulation-based likelihood incurs a much smaller computational cost than inference with an explicit likelihood. This work uses a new IMNNs implementation in $texttt{Jax}$ that can take advantage of fully-differentiable simulation and inference pipeline. We also demonstrate that a single retraining of the IMNN summaries effectively achieves the theoretically maximal information, enhancing the robustness to the choice of fiducial model where the IMNN is trained.

rate research

Bayesian cosmological inference through implicit cross-correlation statistics

114 - Guilhem Lavaux , Jens Jasche 2021

Analyzes of next-generation galaxy data require accurate treatment of systematic effects such as the bias between observed galaxies and the underlying matter density field. However, proposed models of the phenomenon are either numerically expensive or too inaccurate to achieve unbiased inferences of cosmological parameters even at mildly-nonlinear scales of the data. As an alternative to constructing accurate galaxy bias models, requiring understanding galaxy formation, we propose to construct likelihood distributions for Bayesian forward modeling approaches that are insensitive to linear, scale-dependent bias and provide robustness against model misspecification. We use maximum entropy arguments to construct likelihood distributions designed to account only for correlations between data and inferred quantities. By design these correlations are insensitive to linear galaxy biasing relations, providing the desired robustness. The method is implemented and tested within a Markov Chain Monte Carlo approach. The method is assessed using a halo mock catalog based on standard full, cosmological, N-body simulations. We obtain unbiased and tight constraints on cosmological parameters exploiting only linear cross-correlation rates for $kle 0.10$ Mpc/h. Tests for halos of masses ~10$^{12}$ M$_odot$ to ~10$^{13}$ M$_odot$ indicate that it is possible to ignore all details of the linear, scale dependent, bias function while obtaining robust constraints on cosmology. Our results provide a promising path forward to analyzes of galaxy surveys without the requirement of having to accurately model the details of galaxy biasing but by designing robust likelihoods for the inference.

Cosmology and Nongalactic Astrophysics Instrumentation and Methods for Astrophysics

Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology

116 - Justin Alsing , Benjamin Wandelt , Stephen Feeney 2018

Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data-space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper we use massive asymptotically-optimal data compression to reduce the dimensionality of the data-space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (textsc{delfi}), which learns a parameterized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate Density Estimation Likelihood-Free Inference with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as $sim 10^4$ simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological datasets.

Cosmology and Nongalactic Astrophysics

Sequential Likelihood-Free Inference with Implicit Surrogate Proposal

107 - Dongjun Kim , Kyungwoo Song , YoonYeong Kim 2020

Bayesian inference without the access of likelihood, or likelihood-free inference, has been a key research topic in simulations, to yield a more realistic generation result. Recent likelihood-free inference updates an approximate posterior sequentially with the dataset of the cumulative simulation input-output pairs over inference rounds. Therefore, the dataset is gathered through the iterative simulations with sampled inputs from a proposal distribution by MCMC, which becomes the key of inference quality in this sequential framework. This paper introduces a new proposal modeling, named as Implicit Surrogate Proposal (ISP), to generate a cumulated dataset with further sample efficiency. ISP constructs the cumulative dataset in the most diverse way by drawing i.i.d samples via a feed-forward fashion, so the posterior inference does not suffer from the disadvantages of MCMC caused by its non-i.i.d nature, such as auto-correlation and slow mixing. We analyze the convergence property of ISP in both theoretical and empirical aspects to guarantee that ISP provides an asymptotically exact sampler. We demonstrate that ISP outperforms the baseline inference algorithms on simulations with multi-modal posteriors.

Methodology Artificial Intelligence Machine Learning

Nuisance hardened data compression for fast likelihood-free inference

63 - Justin Alsing , Benjamin Wandelt 2019

In this paper we show how nuisance parameter marginalized posteriors can be inferred directly from simulations in a likelihood-free setting, without having to jointly infer the higher-dimensional interesting and nuisance parameter posterior first and marginalize a posteriori. The result is that for an inference task with a given number of interesting parameters, the number of simulations required to perform likelihood-free inference can be kept (roughly) the same irrespective of the number of additional nuisances to be marginalized over. To achieve this we introduce two extensions to the standard likelihood-free inference set-up. Firstly we show how nuisance parameters can be re-cast as latent variables and hence automatically marginalized over in the likelihood-free framework. Secondly, we derive an asymptotically optimal compression from $N$ data down to $n$ summaries -- one per interesting parameter -- such that the Fisher information is (asymptotically) preserved, but the summaries are insensitive (to leading order) to the nuisance parameters. This means that the nuisance marginalized inference task involves learning $n$ interesting parameters from $n$ nuisance hardened data summaries, regardless of the presence or number of additional nuisance parameters to be marginalized over. We validate our approach on two examples from cosmology: supernovae and weak lensing data analyses with nuisance parameterized systematics. For the supernova problem, high-fidelity posterior inference of $Omega_m$ and $w_0$ (marginalized over systematics) can be obtained from just a few hundred data simulations. For the weak lensing problem, six cosmological parameters can be inferred from $mathcal{O}(10^3)$ simulations, irrespective of whether ten additional nuisance parameters are included in the problem or not.

Cosmology and Nongalactic Astrophysics

A Composite Likelihood Approach for Inference under Photometric Redshift Uncertainty

101 - M. M. Rau , C. B. Morrison , S. J. Schmidt 2021

Obtaining accurately calibrated redshift distributions of photometric samples is one of the great challenges in photometric surveys like LSST, Euclid, HSC, KiDS, and DES. We combine the redshift information from the galaxy photometry with constraints from two-point functions, utilizing cross-correlations with spatially overlapping spectroscopic samples. Our likelihood framework is designed to integrate directly into a typical large-scale structure and weak lensing analysis based on two-point functions. We discuss efficient and accurate inference techniques that allow us to scale the method to the large samples of galaxies to be expected in LSST. We consider statistical challenges like the parametrization of redshift systematics, discuss and evaluate techniques to regularize the sample redshift distributions, and investigate techniques that can help to detect and calibrate sources of systematic error using posterior predictive checks. We evaluate and forecast photometric redshift performance using data from the CosmoDC2 simulations, within which we mimic a DESI-like spectroscopic calibration sample for cross-correlations. Using a combination of spatial cross-correlations and photometry, we show that we can provide calibration of the mean of the sample redshift distribution to an accuracy of at least $0.002(1+z)$, consistent with the LSST-Y1 science requirements for weak lensing and large-scale structure probes.

Cosmology and Nongalactic Astrophysics Astrophysics of Galaxies Instrumentation and Methods for Astrophysics