Finding the Mode of a Kernel Density Estimate


الملخص بالإنكليزية

Given points $p_1, dots, p_n$ in $mathbb{R}^d$, how do we find a point $x$ which maximizes $frac{1}{n} sum_{i=1}^n e^{-|p_i - x|^2}$? In other words, how do we find the maximizing point, or mode of a Gaussian kernel density estimation (KDE) centered at $p_1, dots, p_n$? Given the power of KDEs in representing probability distributions and other continuous functions, the basic mode finding problem is widely applicable. However, it is poorly understood algorithmically. Few provable algorithms are known, so practitioners rely on heuristics like the mean-shift algorithm, which are not guaranteed to find a global optimum. We address this challenge by providing fast and provably accurate approximation algorithms for mode finding in both the low and high dimensional settings. For low dimension $d$, our main contribution is to reduce the mode finding problem to a solving a small number of systems of polynomial inequalities. For high dimension $d$, we prove the first dimensionality reduction result for KDE mode finding, which allows for reduction to the low dimensional case. Our result leverages Johnson-Lindenstrauss random projection, Kirszbrauns classic extension theorem, and perhaps surprisingly, the mean-shift heuristic for mode finding.

تحميل البحث