Finding the Mode of a Kernel Density Estimate

127 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wai Ming Tai

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jasper C.H. Lee - Jerry Li - Christopher Musco

بنى وهياكل البيانات والخوارزميات الهندسة الحسابية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Given points $p_1, dots, p_n$ in $mathbb{R}^d$, how do we find a point $x$ which maximizes $frac{1}{n} sum_{i=1}^n e^{-|p_i - x|^2}$? In other words, how do we find the maximizing point, or mode of a Gaussian kernel density estimation (KDE) centered at $p_1, dots, p_n$? Given the power of KDEs in representing probability distributions and other continuous functions, the basic mode finding problem is widely applicable. However, it is poorly understood algorithmically. Few provable algorithms are known, so practitioners rely on heuristics like the mean-shift algorithm, which are not guaranteed to find a global optimum. We address this challenge by providing fast and provably accurate approximation algorithms for mode finding in both the low and high dimensional settings. For low dimension $d$, our main contribution is to reduce the mode finding problem to a solving a small number of systems of polynomial inequalities. For high dimension $d$, we prove the first dimensionality reduction result for KDE mode finding, which allows for reduction to the low dimensional case. Our result leverages Johnson-Lindenstrauss random projection, Kirszbrauns classic extension theorem, and perhaps surprisingly, the mean-shift heuristic for mode finding.

قيم البحث

168 - Wai Ming Tai 2020

Given a point set $Psubset mathbb{R}^d$, a kernel density estimation for Gaussian kernel is defined as $overline{mathcal{G}}_P(x) = frac{1}{left|Pright|}sum_{pin P}e^{-leftlVert x-p rightrVert^2}$ for any $xinmathbb{R}^d$. We study how to construct a small subset $Q$ of $P$ such that the kernel density estimation of $P$ can be approximated by the kernel density estimation of $Q$. This subset $Q$ is called coreset. The primary technique in this work is to construct $pm 1$ coloring on the point set $P$ by the discrepancy theory and apply this coloring algorithm recursively. Our result leverages Banaszczyks Theorem. When $d>1$ is constant, our construction gives a coreset of size $Oleft(frac{1}{varepsilon}right)$ as opposed to the best-known result of $Oleft(frac{1}{varepsilon}sqrt{logfrac{1}{varepsilon}}right)$. It is the first to give a breakthrough on the barrier of $sqrt{log}$ factor even when $d=2$.

بنى وهياكل البيانات والخوارزميات الهندسة الحسابية التعلم الآلي

Finding Efficient Region in The Plane with Line segments

169 - Jack Wang 2012

Let $mathscr O$ be a set of $n$ disjoint obstacles in $mathbb{R}^2$, $mathscr M$ be a moving object. Let $s$ and $l$ denote the starting point and maximum path length of the moving object $mathscr M$, respectively. Given a point $p$ in ${R}^2$, we sa y the point $p$ is achievable for $mathscr M$ such that $pi(s,p)leq l$, where $pi(cdot)$ denotes the shortest path length in the presence of obstacles. One is to find a region $mathscr R$ such that, for any point $pin mathbb{R}^2$, if it is achievable for $mathscr M$, then $pin mathscr R$; otherwise, $p otin mathscr R$. In this paper, we restrict our attention to the case of line-segment obstacles. To tackle this problem, we develop three algorithms. We first present a simpler-version algorithm for the sake of intuition. Its basic idea is to reduce our problem to computing the union of a set of circular visibility regions (CVRs). This algorithm takes $O(n^3)$ time. By analysing its dominant steps, we break through its bottleneck by using the short path map (SPM) technique to obtain those circles (unavailable beforehand), yielding an $O(n^2log n)$ algorithm. Owing to the finding above, the third algorithm also uses the SPM technique. It however, does not continue to construct the CVRs. Instead, it directly traverses each region of the SPM to trace the boundaries, the final algorithm obtains $O(nlog n)$ complexity.

بنى وهياكل البيانات والخوارزميات الهندسة الحسابية

Near-Optimal Coresets of Kernel Density Estimates

238 - Jeff M. Phillips , Wai Ming Tai 2018

We construct near-optimal coresets for kernel density estimates for points in $mathbb{R}^d$ when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size $O(sqrt{d}/varepsiloncdot sqrt{log 1/varepsilo n} )$, and we show a near-matching lower bound of size $Omega(min{sqrt{d}/varepsilon, 1/varepsilon^2})$. When $dgeq 1/varepsilon^2$, it is known that the size of coreset can be $O(1/varepsilon^2)$. The upper bound is a polynomial-in-$(1/varepsilon)$ improvement when $d in [3,1/varepsilon^2)$ and the lower bound is the first known lower bound to depend on $d$ for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide-variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative.

التعلم الآلي الهندسة الحسابية التعلم الالي

Improved Coresets for Kernel Density Estimates

115 - Jeff M. Phillips , Wai Ming Tai 2017

We study the construction of coresets for kernel density estimates. That is we show how to approximate the kernel density estimate described by a large point set with another kernel density estimate with a much smaller point set. For characteristic k ernels (including Gaussian and Laplace kernels), our approximation preserves the $L_infty$ error between kernel density estimates within error $epsilon$, with coreset size $2/epsilon^2$, but no other aspects of the data, including the dimension, the diameter of the point set, or the bandwidth of the kernel common to other approximations. When the dimension is unrestricted, we show this bound is tight for these kernels as well as a much broader set. This work provides a careful analysis of the iterative Frank-Wolfe algorithm adapted to this context, an algorithm called emph{kernel herding}. This analysis unites a broad line of work that spans statistics, machine learning, and geometry. When the dimension $d$ is constant, we demonstrate much tighter bounds on the size of the coreset specifically for Gaussian kernels, showing that it is bounded by the size of the coreset for axis-aligned rectangles. Currently the best known constructive bound is $O(frac{1}{epsilon} log^d frac{1}{epsilon})$, and non-constructively, this can be improved by $sqrt{log frac{1}{epsilon}}$. This improves the best constant dimension bounds polynomially for $d geq 3$.

التعلم الآلي الهندسة الحسابية التعلم الالي

Finding, Hitting and Packing Cycles in Subexponential Time on Unit Disk Graphs

159 - Fedor V. Fomin , Daniel Lokshtanov , Fahad Panolan 2017

We give algorithms with running time $2^{O({sqrt{k}log{k}})} cdot n^{O(1)}$ for the following problems. Given an $n$-vertex unit disk graph $G$ and an integer $k$, decide whether $G$ contains (1) a path on exactly/at least $k$ vertices, (2) a cycle o n exactly $k$ vertices, (3) a cycle on at least $k$ vertices, (4) a feedback vertex set of size at most $k$, and (5) a set of $k$ pairwise vertex-disjoint cycles. For the first three problems, no subexponential time parameterized algorithms were previously known. For the remaining two problems, our algorithms significantly outperform the previously best known parameterized algorithms that run in time $2^{O(k^{0.75}log{k})} cdot n^{O(1)}$. Our algorithms are based on a new kind of tree decompositions of unit disk graphs where the separators can have size up to $k^{O(1)}$ and there exists a solution that crosses every separator at most $O(sqrt{k})$ times. The running times of our algorithms are optimal up to the $log{k}$ factor in the exponent, assuming the Exponential Time Hypothesis.

بنى وهياكل البيانات والخوارزميات الهندسة الحسابية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الوادي الدولية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Finding the Mode of a Kernel Density Estimate

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً