ترغب بنشر مسار تعليمي؟ اضغط هنا

On Estimating Non-uniform Density Distributions using N Nearest Neighbors

65   0   0.0 ( 0 )
 نشر من قبل Przemyslaw Wozniak
 تاريخ النشر 2013
  مجال البحث فيزياء
والبحث باللغة English
 تأليف Przemek Wozniak




اسأل ChatGPT حول البحث

We consider density estimators based on the nearest neighbors method applied to discrete point distibutions in spaces of arbitrary dimensionality. If the density is constant, the volume of a hypersphere centered at a random location is proportional to the expected number of points falling within the hypersphere radius. The distance to the $N$-th nearest neighbor alone is then a sufficient statistic for the density. In the non-uniform case the proportionality is distorted. We model this distortion by normalizing hypersphere volumes to the largest one and expressing the resulting distribution in terms of the Legendre polynomials. Using Monte Carlo simulations we show that this approach can be used to effectively address the tradeoff between smoothing bias and estimator variance for sparsely sampled distributions.

قيم البحث

اقرأ أيضاً

We present a new regular grid search algorithm for quick fixed-radius nearest-neighbor lookup developed in Python. This module indexes a set of k-dimensional points in a regular grid, with optional periodic conditions, providing a fast approach for n earest neighbors queries. In this first installment we provide three types of queries: $bubble$, $shell$ and the $nth-nearest$; as well as three different metrics of interest in astronomy: the $euclidean$ and two distance functions in spherical coordinates of varying precision, $haversine$ and $Vincenty$; and the possibility of providing a custom distance function. This package results particularly useful for large datasets where a brute-force search turns impractical.
The observed velocities of the gas in barred galaxies are a combination of the azimuthally-averaged circular velocity and non-circular motions, primarily caused by gas streaming along the bar. These non-circular flows must be accounted for before the observed velocities can be used in mass modeling. In this work, we examine the performance of the tilted-ring method and the DiskFit algorithm for transforming velocity maps of barred spiral galaxies into rotation curves (RCs) using simulated data. We find that the tilted-ring method, which does not account for streaming motions, under/over-estimates the circular motions when the bar is parallel/perpendicular to the projected major axis. DiskFit, which does include streaming motions, is limited to orientations where the bar is not-aligned with either the major or minor axis of the image. Therefore, we propose a method of correcting RCs based on numerical simulations of galaxies. We correct the RC derived from the tilted-ring method based on a numerical simulation of a galaxy with similar properties and projections as the observed galaxy. Using observations of NGC 3319, which has a bar aligned with the major axis, as a test case, we show that the inferred mass models from the uncorrected and corrected RCs are significantly different. These results show the importance of correcting for the non-circular motions and demonstrate that new methods of accounting for these motions are necessary as current methods fail for specific bar alignments.
In this paper, we propose an ensemble learning algorithm called textit{under-bagging $k$-nearest neighbors} (textit{under-bagging $k$-NN}) for imbalanced classification problems. On the theoretical side, by developing a new learning theory analysis, we show that with properly chosen parameters, i.e., the number of nearest neighbors $k$, the expected sub-sample size $s$, and the bagging rounds $B$, optimal convergence rates for under-bagging $k$-NN can be achieved under mild assumptions w.r.t.~the arithmetic mean (AM) of recalls. Moreover, we show that with a relatively small $B$, the expected sub-sample size $s$ can be much smaller than the number of training data $n$ at each bagging round, and the number of nearest neighbors $k$ can be reduced simultaneously, especially when the data are highly imbalanced, which leads to substantially lower time complexity and roughly the same space complexity. On the practical side, we conduct numerical experiments to verify the theoretical results on the benefits of the under-bagging technique by the promising AM performance and efficiency of our proposed algorithm.
In this paper, we report progress on answering the open problem presented by Pagh~[14], who considered the nearest neighbor search without false negatives for the Hamming distance. We show new data structures for solving the $c$-approximate nearest n eighbors problem without false negatives for Euclidean high dimensional space $mathcal{R}^d$. These data structures work for any $c = omega(sqrt{log{log{n}}})$, where $n$ is the number of points in the input set, with poly-logarithmic query time and polynomial preprocessing time. This improves over the known algorithms, which require $c$ to be $Omega(sqrt{d})$. This improvement is obtained by applying a sequence of reductions, which are interesting on their own. First, we reduce the problem to $d$ instances of dimension logarithmic in $n$. Next, these instances are reduced to a number of $c$-approximate nearest neighbor search instances in $big(mathbb{R}^kbig)^L$ space equipped with metric $m(x,y) = max_{1 le i le L}(lVert x_i - y_irVert_2)$.
The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sa mpling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits. We apply this technique to solve the problem of high-dimensional $k$-nearest neighbors, developing an algorithm which we prove is able to identify exact nearest neighbors with high probability. We show that under regularity assumptions on a dataset of $n$ points in $d$-dimensional space, the complexity of our algorithm scales logarithmically with the dimension of the data as $Oleft((n+d)log^2 left(frac{nd}{delta}right)right)$ for error probability $delta$, rather than linearly as in exact computation requiring $O(nd)$. We corroborate our theoretical results with numerical simulations, showing that our algorithm outperforms both exact computation and state-of-the-art algorithms such as kGraph, NGT, and LSH on real datasets.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا