ترغب بنشر مسار تعليمي؟ اضغط هنا

Quantifying Suspiciousness Within Correlated Data Sets

110   0   0.0 ( 0 )
 نشر من قبل Pablo Lemos
 تاريخ النشر 2019
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a principled Bayesian method for quantifying tension between correlated datasets with wide uninformative parameter priors. This is achieved by extending the Suspiciousness statistic, which is insensitive to priors. Our method uses global summary statistics, and as such it can be used as a diagnostic for internal consistency. We show how our approach can be combined with methods that use parameter space and data space to identify the existing internal discrepancies. As an example, we use it to test the internal consistency of the KiDS-450 data in 4 photometric redshift bins, and to recover controlled internal discrepancies in simulated KiDS data. We propose this as a diagnostic of internal consistency for present and future cosmological surveys, and as a tension metric for data sets that have non-negligible correlation, such as LSST and Euclid.



قيم البحث

اقرأ أيضاً

102 - Stephen Skory 2010
Modern N-body cosmological simulations contain billions ($10^9$) of dark matter particles. These simulations require hundreds to thousands of gigabytes of memory, and employ hundreds to tens of thousands of processing cores on many compute nodes. In order to study the distribution of dark matter in a cosmological simulation, the dark matter halos must be identified using a halo finder, which establishes the halo membership of every particle in the simulation. The resources required for halo finding are similar to the requirements for the simulation itself. In particular, simulations have become too extensive to use commonly-employed halo finders, such that the computational requirements to identify halos must now be spread across multiple nodes and cores. Here we present a scalable-parallel halo finding method called Parallel HOP for large-scale cosmological simulation data. Based on the halo finder HOP, it utilizes MPI and domain decomposition to distribute the halo finding workload across multiple compute nodes, enabling analysis of much larger datasets than is possible with the strictly serial or previous parallel implementations of HOP. We provide a reference implementation of this method as a part of the toolkit yt, an analysis toolkit for Adaptive Mesh Refinement (AMR) data that includes complementary analysis modules. Additionally, we discuss a suite of benchmarks that demonstrate that this method scales well up to several hundred tasks and datasets in excess of $2000^3$ particles. The Parallel HOP method and our implementation can be readily applied to any kind of N-body simulation data and is therefore widely applicable.
We present an investigation of the horizon and its effect on global 21-cm observations and analysis. We find that the horizon cannot be ignored when modeling low frequency observations. Even if the sky and antenna beam are known exactly, forward mode ls cannot fully describe the beam-weighted foreground component without accurate knowledge of the horizon. When fitting data to extract the 21-cm signal, a single time-averaged spectrum or independent multi-spectrum fits may be able to compensate for the bias imposed by the horizon. However, these types of fits lack constraining power on the 21-cm signal, leading to large uncertainties on the signal extraction, in some cases larger in magnitude than the 21-cm signal itself. A significant decrease in signal uncertainty can be achieved by performing multi-spectrum fits in which the spectra are modeled simultaneously with common parameters. The cost of this greatly increased constraining power, however, is that the time dependence of the horizons effect, which is more complex than its spectral dependence, must be precisely modeled to achieve a good fit. To aid in modeling the horizon, we present an algorithm and Python package for calculating the horizon profile from a given observation site using elevation data. We also address several practical concerns such as pixelization error, uncertainty in the horizon profile, and foreground obstructions such as surrounding buildings and vegetation. We demonstrate that our training set-based analysis pipeline can account for all of these factors to model the horizon well enough to precisely extract the 21-cm signal from simulated observations.
We demonstrate a measure for the effective number of parameters constrained by a posterior distribution in the context of cosmology. In the same way that the mean of the Shannon information (i.e. the Kullback-Leibler divergence) provides a measure of the strength of constraint between prior and posterior, we show that the variance of the Shannon information gives a measure of dimensionality of constraint. We examine this quantity in a cosmological context, applying it to likelihoods derived from Cosmic Microwave Background, large scale structure and supernovae data. We show that this measure of Bayesian model dimensionality compares favourably both analytically and numerically in a cosmological context with the existing measure of model complexity used in the literature.
We present results from a data challenge posed to the radial velocity (RV) community: namely, to quantify the Bayesian evidence for n={0,1,2,3} planets in a set of synthetically generated RV datasets containing a range of planet signals. Participatin g teams were provided the same likelihood function and set of priors to use in their analysis. They applied a variety of methods to estimate Z, the marginal likelihood for each n-planet model, including cross-validation, the Laplace approximation, importance sampling, and nested sampling. We found the dispersion in Z across different methods grew with increasing n-planet models: ~3 for 0-planets, ~10 for 1-planet, ~100-1000 for 2-planets, and >10,000 for 3-planets. Most internal estimates of uncertainty in Z for individual methods significantly underestimated the observed dispersion across all methods. Methods that adopted a Monte Carlo approach by comparing estimates from multiple runs yielded plausible uncertainties. Finally, two classes of numerical algorithms (those based on importance and nested samplers) arrived at similar conclusions regarding the ratio of Zs for n and (n+1)-planet models. One analytic method (the Laplace approximation) demonstrated comparable performance. We express both optimism and caution: we demonstrate that it is practical to perform rigorous Bayesian model comparison for <=3-planet models, yet robust planet discoveries require researchers to better understand the uncertainty in Z and its connections to model selection.
We provide a new interpretation for the Bayes factor combination used in the Dark Energy Survey (DES) first year analysis to quantify the tension between the DES and Planck datasets. The ratio quantifies a Bayesian confidence in our ability to combin e the datasets. This interpretation is prior-dependent, with wider prior widths boosting the confidence. We therefore propose that if there are any reasonable priors which reduce the confidence to below unity, then we cannot assert that the datasets are compatible. Computing the evidence ratios for the DES first year analysis and Planck, given that narrower priors drop the confidence to below unity, we conclude that DES and Planck are, in a Bayesian sense, incompatible under LCDM. Additionally we compute ratios which confirm the consensus that measurements of the acoustic scale by the Baryon Oscillation Spectroscopic Survey (SDSS) are compatible with Planck, whilst direct measurements of the acceleration rate of the Universe by the SHOES collaboration are not. We propose a modification to the Bayes ratio which removes the prior dependency using Kullback-Leibler divergences, and using this statistical test find Planck in strong tension with SHOES, in moderate tension with DES, and in no tension with SDSS. We propose this statistic as the optimal way to compare datasets, ahead of the next DES data releases, as well as future surveys. Finally, as an element of these calculations, we introduce in a cosmological setting the Bayesian model dimensionality, which is a parameterisation-independent measure of the number of parameters that a given dataset constrains.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا