Nearest Neighbor distributions: new statistical measures for cosmological clustering

172 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Arka Banerjee

تاريخ النشر 2020

مجال البحث فيزياء

والبحث باللغة English

تأليف Arka Banerjee - Tom Abel

علم الكونيات والفيزياء الفلكية Nongalactic

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The use of summary statistics beyond the two-point correlation function to analyze the non-Gaussian clustering on small scales is an active field of research in cosmology. In this paper, we explore a set of new summary statistics -- the $k$-Nearest Neighbor Cumulative Distribution Functions ($k{rm NN}$-${rm CDF}$). This is the empirical cumulative distribution function of distances from a set of volume-filling, Poisson distributed random points to the $k$-nearest data points, and is sensitive to all connected $N$-point correlations in the data. The $k{rm NN}$-${rm CDF}$ can be used to measure counts in cell, void probability distributions and higher $N$-point correlation functions, all using the same formalism exploiting fast searches with spatial tree data structures. We demonstrate how it can be computed efficiently from various data sets - both discrete points, and the generalization for continuous fields. We use data from a large suite of $N$-body simulations to explore the sensitivity of this new statistic to various cosmological parameters, compared to the two-point correlation function, while using the same range of scales. We demonstrate that the use of $k{rm NN}$-${rm CDF}$ improves the constraints on the cosmological parameters by more than a factor of $2$ when applied to the clustering of dark matter in the range of scales between $10h^{-1}{rm Mpc}$ and $40h^{-1}{rm Mpc}$. We also show that relative improvement is even greater when applied on the same scales to the clustering of halos in the simulations at a fixed number density, both in real space, as well as in redshift space. Since the $k{rm NN}$-${rm CDF}$ are sensitive to all higher order connected correlation functions in the data, the gains over traditional two-point analyses are expected to grow as progressively smaller scales are included in the analysis of cosmological data.

قيم البحث

72 - Arka Banerjee , Tom Abel 2021

Cross-correlations between datasets are used in many different contexts in cosmological analyses. Recently, $k$-Nearest Neighbor Cumulative Distribution Functions ($k{rm NN}$-${rm CDF}$) were shown to be sensitive probes of cosmological (auto) cluste ring. In this paper, we extend the framework of nearest neighbor measurements to describe joint distributions of, and correlations between, two datasets. We describe the measurement of joint $k{rm NN}$-${rm CDF}$s, and show that these measurements are sensitive to all possible connected $N$-point functions that can be defined in terms of the two datasets. We describe how the cross-correlations can be isolated by combining measurements of the joint $k{rm NN}$-${rm CDF}$s and those measured from individual datasets. We demonstrate the application of these measurements in the context of Gaussian density fields, as well as for fully nonlinear cosmological datasets. Using a Fisher analysis, we show that measurements of the halo-matter cross-correlations, as measured through nearest neighbor measurements are more sensitive to the underlying cosmological parameters, compared to traditional two-point cross-correlation measurements over the same range of scales. Finally, we demonstrate how the nearest neighbor cross-correlations can robustly detect cross correlations between sparse samples -- the same regime where the two-point cross-correlation measurements are dominated by noise.

علم الكونيات والفيزياء الفلكية Nongalactic

96 - Lehman H. Garrison , Tom Abel , Daniel J. Eisenstein 2021

We use the $k$-nearest neighbor probability distribution function ($k$NN-PDF, Banerjee & Abel 2021) to assess convergence in a scale-free $N$-body simulation. Compared to our previous two-point analysis, the $k$NN-PDF allows us to quantify our result s in the language of halos and numbers of particles, while also incorporating non-Gaussian information. We find good convergence for 32 particles and greater at densities typical of halos, while 16 particles and fewer appears unconverged. Halving the softening length extends convergence to higher densities, but not to fewer particles. Our analysis is less sensitive to voids, but we analyze a limited range of underdensities and find evidence for convergence at 16 particles and greater even in sparse voids.

علم الكونيات والفيزياء الفلكية Nongalactic

Modeling Nearest Neighbor distributions of biased tracers using Hybrid Effective Field Theory

184 - Arka Banerjee , Nickolas Kokron , Tom Abel 2021

We investigate the application of Hybrid Effective Field Theory (HEFT) -- which combines a Lagrangian bias expansion with subsequent particle dynamics from $N$-body simulations -- to the modeling of $k$-Nearest Neighbor Cumulative Distribution Functi ons ($k{rm NN}$-${rm CDF}$s) of biased tracers of the cosmological matter field. The $k{rm NN}$-${rm CDF}$s are sensitive to all higher order connected $N$-point functions in the data, but are computationally cheap to compute. We develop the formalism to predict the $k{rm NN}$-${rm CDF}$s of discrete tracers of a continuous field from the statistics of the continuous field itself. Using this formalism, we demonstrate how $k{rm NN}$-${rm CDF}$ statistics of a set of biased tracers, such as halos or galaxies, of the cosmological matter field can be modeled given a set of low-redshift HEFT component fields and bias parameter values. These are the same ingredients needed to predict the two-point clustering. For a specific sample of halos, we show that both the two-point clustering textit{and} the $k{rm NN}$-${rm CDF}$s can be well-fit on quasi-linear scales ($gtrsim 20 h^{-1}{rm Mpc}$) by the second-order HEFT formalism with the textit{same values} of the bias parameters, implying that joint modeling of the two is possible. Finally, using a Fisher matrix analysis, we show that including $k{rm NN}$-${rm CDF}$ measurements over the range of allowed scales in the HEFT framework can improve the constraints on $sigma_8$ by roughly a factor of $3$, compared to the case where only two-point measurements are considered. Combining the statistical power of $k{rm NN}$ measurements with the modeling power of HEFT, therefore, represents an exciting prospect for extracting greater information from small-scale cosmological clustering.

علم الكونيات والفيزياء الفلكية Nongalactic

Clustering by Hierarchical Nearest Neighbor Descent (H-NND)

76 - Teng Qiu , Yongjie Li 2015

Previously in 2014, we proposed the Nearest Descent (ND) method, capable of generating an efficient Graph, called the in-tree (IT). Due to some beautiful and effective features, this IT structure proves well suited for data clustering. Although there exist some redundant edges in IT, they usually have salient features and thus it is not hard to remove them. Subsequently, in order to prevent the seemingly redundant edges from occurring, we proposed the Nearest Neighbor Descent (NND) by adding the Neighborhood constraint on ND. Consequently, clusters automatically emerged, without the additional requirement of removing the redundant edges. However, NND proved still not perfect, since it brought in a new yet worse problem, the over-partitioning problem. Now, in this paper, we propose a method, called the Hierarchical Nearest Neighbor Descent (H-NND), which overcomes the over-partitioning problem of NND via using the hierarchical strategy. Specifically, H-NND uses ND to effectively merge the over-segmented sub-graphs or clusters that NND produces. Like ND, H-NND also generates the IT structure, in which the redundant edges once again appear. This seemingly comes back to the situation that ND faces. However, compared with ND, the redundant edges in the IT structure generated by H-NND generally become more salient, thus being much easier and more reliable to be identified even by the simplest edge-removing method which takes the edge length as the only measure. In other words, the IT structure constructed by H-NND becomes more fitted for data clustering. We prove this on several clustering datasets of varying shapes, dimensions and attributes. Besides, compared with ND, H-NND generally takes less computation time to construct the IT data structure for the input data.

التعلم الالي الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Efficient Nearest-Neighbor Query and Clustering of Planar Curves

66 - Boris Aronov , Omrit Filtser , Michael Horton 2019

We study two fundamental problems dealing with curves in the plane, namely, the nearest-neighbor problem and the center problem. Let $mathcal{C}$ be a set of $n$ polygonal curves, each of size $m$. In the nearest-neighbor problem, the goal is to cons truct a compact data structure over $mathcal{C}$, such that, given a query curve $Q$, one can efficiently find the curve in $mathcal{C}$ closest to $Q$. In the center problem, the goal is to find a curve $Q$, such that the maximum distance between $Q$ and the curves in $mathcal{C}$ is minimized. We use the well-known discrete Frechet distance function, both under~$L_infty$ and under $L_2$, to measure the distance between two curves. For the nearest-neighbor problem, despite discouraging previous results, we identify two important cases for which it is possible to obtain practical bounds, even when $m$ and $n$ are large. In these cases, either $Q$ is a line segment or $mathcal{C}$ consists of line segments, and the bounds on the size of the data structure and query time are nearly linear in the size of the input and query curve, respectively. The returned answer is either exact under $L_infty$, or approximated to within a factor of $1+varepsilon$ under~$L_2$. We also consider the variants in which the location of the input curves is only fixed up to translation, and obtain similar bounds, under $L_infty$. As for the center problem, we study the case where the center is a line segment, i.e., we seek the line segment that represents the given set as well as possible. We present near-linear time exact algorithms under $L_infty$, even when the location of the input curves is only fixed up to translation. Under $L_2$, we present a roughly $O(n^2m^3)$-time exact algorithm.

الهندسة الحسابية

سجل دخول لتتمكن من نشر تعليقات