ترغب بنشر مسار تعليمي؟ اضغط هنا

Tight Bounds for Approximate Near Neighbor Searching for Time Series under the Frechet Distance

394   0   0.0 ( 0 )
 نشر من قبل Andr\\'e Nusser
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the $c$-approximate near neighbor problem under the continuous Frechet distance: Given a set of $n$ polygonal curves with $m$ vertices, a radius $delta > 0$, and a parameter $k leq m$, we want to preprocess the curves into a data structure that, given a query curve $q$ with $k$ vertices, either returns an input curve with Frechet distance at most $ccdot delta$ to $q$, or returns that there exists no input curve with Frechet distance at most $delta$ to $q$. We focus on the case where the input and the queries are one-dimensional polygonal curves -- also called time series -- and we give a comprehensive analysis for this case. We obtain new upper bounds that provide different tradeoffs between approximation factor, preprocessing time, and query time. Our data structures improve upon the state of the art in several ways. We show that for any $0 < varepsilon leq 1$ an approximation factor of $(1+varepsilon)$ can be achieved within the same asymptotic time bounds as the previously best result for $(2+varepsilon)$. Moreover, we show that an approximation factor of $(2+varepsilon)$ can be obtained by using preprocessing time and space $O(nm)$, which is linear in the input size, and query time in $O(frac{1}{varepsilon})^{k+2}$, where the previously best result used preprocessing time in $n cdot O(frac{m}{varepsilon k})^k$ and query time in $O(1)^k$. We complement our upper bounds with matching conditional lower bounds based on the Orthogonal Vectors Hypothesis. Interestingly, some of our lower bounds already hold for any super-constant value of $k$. This is achieved by proving hardness of a one-sided sparse version of the Orthogonal Vectors problem as an intermediate problem, which we believe to be of independent interest.

قيم البحث

اقرأ أيضاً

We study approximate-near-neighbor data structures for time series under the continuous Frechet distance. For an attainable approximation factor $c>1$ and a query radius $r$, an approximate-near-neighbor data structure can be used to preprocess $n$ c urves in $mathbb{R}$ (aka time series), each of complexity $m$, to answer queries with a curve of complexity $k$ by either returning a curve that lies within Frechet distance $cr$, or answering that there exists no curve in the input within distance $r$. In both cases, the answer is correct. Our first data structure achieves a $(5+epsilon)$ approximation factor, uses space in $ncdot mathcal{O}left({epsilon^{-1}}right)^{k} + mathcal{O}(nm)$ and has query time in $mathcal{O}left(kright)$. Our second data structure achieves a $(2+epsilon)$ approximation factor, uses space in $ncdot mathcal{O}left(frac{m}{kepsilon}right)^{k} + mathcal{O}(nm)$ and has query time in $mathcal{O}left(kcdot 2^kright)$. Our third positive result is a probabilistic data structure based on locality-sensitive hashing, which achieves space in $mathcal{O}(nlog n+nm)$ and query time in $mathcal{O}(klog n)$, and which answers queries with an approximation factor in $mathcal{O}(k)$. All of our data structures make use of the concept of signatures, which were originally introduced for the problem of clustering time series under the Frechet distance. In addition, we show lower bounds for this problem. Consider any data structure which achieves an approximation factor less than $2$ and which supports curves of arclength up to $L$ and answers the query using only a constant number of probes. We show that under reasonable assumptions on the word size any such data structure needs space in $L^{Omega(k)}$.
Computing the similarity of two point sets is a ubiquitous task in medical imaging, geometric shape comparison, trajectory analysis, and many more settings. Arguably the most basic distance measure for this task is the Hausdorff distance, which assig ns to each point from one set the closest point in the other set and then evaluates the maximum distance of any assigned pair. A drawback is that this distance measure is not translational invariant, that is, comparing two objects just according to their shape while disregarding their position in space is impossible. Fortunately, there is a canonical translational invariant version, the Hausdorff distance under translation, which minimizes the Hausdorff distance over all translations of one of the point sets. For point sets of size $n$ and $m$, the Hausdorff distance under translation can be computed in time $tilde O(nm)$ for the $L_1$ and $L_infty$ norm [Chew, Kedem SWAT92] and $tilde O(nm (n+m))$ for the $L_2$ norm [Huttenlocher, Kedem, Sharir DCG93]. As these bounds have not been improved for over 25 years, in this paper we approach the Hausdorff distance under translation from the perspective of fine-grained complexity theory. We show (i) a matching lower bound of $(nm)^{1-o(1)}$ for $L_1$ and $L_infty$ (and all other $L_p$ norms) assuming the Orthogonal Vectors Hypothesis and (ii) a matching lower bound of $n^{2-o(1)}$ for $L_2$ in the imbalanced case of $m = O(1)$ assuming the 3SUM Hypothesis.
In this paper we study a wide range of variants for computing the (discrete and continuous) Frechet distance between uncertain curves. We define an uncertain curve as a sequence of uncertainty regions, where each region is a disk, a line segment, or a set of points. A realisation of a curve is a polyline connecting one point from each region. Given an uncertain curve and a second (certain or uncertain) curve, we seek to compute the lower and upper bound Frechet distance, which are the minimum and maximum Frechet distance for any realisations of the curves. We prove that both the upper and lower bound problems are NP-hard for the continuous Frechet distance in several uncertainty models, and that the upper bound problem remains hard for the discrete Frechet distance. In contrast, the lower bound (discrete and continuous) Frechet distance can be computed in polynomial time. Furthermore, we show that computing the expected discrete Frechet distance is #P-hard when the uncertainty regions are modelled as point sets or line segments. The construction also extends to show #P-hardness for computing the continuous Frechet distance when regions are modelled as point sets. On the positive side, we argue that in any constant dimension there is a FPTAS for the lower bound problem when $Delta / delta$ is polynomially bounded, where $delta$ is the Frechet distance and $Delta$ bounds the diameter of the regions. We then argue there is a near-linear-time 3-approximation for the decision problem when the regions are convex and roughly $delta$-separated. Finally, we also study the setting with Sakoe--Chiba time bands, where we restrict the alignment between the two curves, and give polynomial-time algorithms for upper bound and expected discrete and continuous Frechet distance for uncertainty regions modelled as point sets.
We show tight bounds for online Hamming distance computation in the cell-probe model with word size w. The task is to output the Hamming distance between a fixed string of length n and the last n symbols of a stream. We give a lower bound of Omega((d /w)*log n) time on average per output, where d is the number of bits needed to represent an input symbol. We argue that this bound is tight within the model. The lower bound holds under randomisation and amortisation.
The Frechet distance is a popular distance measure for curves which naturally lends itself to fundamental computational tasks, such as clustering, nearest-neighbor searching, and spherical range searching in the corresponding metric space. However, i ts inherent complexity poses considerable computational challenges in practice. To address this problem we study distortion of the probabilistic embedding that results from projecting the curves to a randomly chosen line. Such an embedding could be used in combination with, e.g. locality-sensitive hashing. We show that in the worst case and under reasonable assumptions, the discrete Frechet distance between two polygonal curves of complexity $t$ in $mathbb{R}^d$, where $dinlbrace 2,3,4,5rbrace$, degrades by a factor linear in $t$ with constant probability. We show upper and lower bounds on the distortion. We also evaluate our findings empirically on a benchmark data set. The preliminary experimental results stand in stark contrast with our lower bounds. They indicate that highly distorted projections happen very rarely in practice, and only for strongly conditioned input curves. Keywords: Frechet distance, metric embeddings, random projections
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا