Do you want to publish a course? Click here

$(2+epsilon)$-ANN for time series under the Frechet distance

83   0   0.0 ( 0 )
 Added by Ioannis Psarros
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

We study approximate-near-neighbor data structures for time series under the continuous Frechet distance. For an attainable approximation factor $c>1$ and a query radius $r$, an approximate-near-neighbor data structure can be used to preprocess $n$ curves in $mathbb{R}$ (aka time series), each of complexity $m$, to answer queries with a curve of complexity $k$ by either returning a curve that lies within Frechet distance $cr$, or answering that there exists no curve in the input within distance $r$. In both cases, the answer is correct. Our first data structure achieves a $(5+epsilon)$ approximation factor, uses space in $ncdot mathcal{O}left({epsilon^{-1}}right)^{k} + mathcal{O}(nm)$ and has query time in $mathcal{O}left(kright)$. Our second data structure achieves a $(2+epsilon)$ approximation factor, uses space in $ncdot mathcal{O}left(frac{m}{kepsilon}right)^{k} + mathcal{O}(nm)$ and has query time in $mathcal{O}left(kcdot 2^kright)$. Our third positive result is a probabilistic data structure based on locality-sensitive hashing, which achieves space in $mathcal{O}(nlog n+nm)$ and query time in $mathcal{O}(klog n)$, and which answers queries with an approximation factor in $mathcal{O}(k)$. All of our data structures make use of the concept of signatures, which were originally introduced for the problem of clustering time series under the Frechet distance. In addition, we show lower bounds for this problem. Consider any data structure which achieves an approximation factor less than $2$ and which supports curves of arclength up to $L$ and answers the query using only a constant number of probes. We show that under reasonable assumptions on the word size any such data structure needs space in $L^{Omega(k)}$.



rate research

Read More

We study the $c$-approximate near neighbor problem under the continuous Frechet distance: Given a set of $n$ polygonal curves with $m$ vertices, a radius $delta > 0$, and a parameter $k leq m$, we want to preprocess the curves into a data structure that, given a query curve $q$ with $k$ vertices, either returns an input curve with Frechet distance at most $ccdot delta$ to $q$, or returns that there exists no input curve with Frechet distance at most $delta$ to $q$. We focus on the case where the input and the queries are one-dimensional polygonal curves -- also called time series -- and we give a comprehensive analysis for this case. We obtain new upper bounds that provide different tradeoffs between approximation factor, preprocessing time, and query time. Our data structures improve upon the state of the art in several ways. We show that for any $0 < varepsilon leq 1$ an approximation factor of $(1+varepsilon)$ can be achieved within the same asymptotic time bounds as the previously best result for $(2+varepsilon)$. Moreover, we show that an approximation factor of $(2+varepsilon)$ can be obtained by using preprocessing time and space $O(nm)$, which is linear in the input size, and query time in $O(frac{1}{varepsilon})^{k+2}$, where the previously best result used preprocessing time in $n cdot O(frac{m}{varepsilon k})^k$ and query time in $O(1)^k$. We complement our upper bounds with matching conditional lower bounds based on the Orthogonal Vectors Hypothesis. Interestingly, some of our lower bounds already hold for any super-constant value of $k$. This is achieved by proving hardness of a one-sided sparse version of the Orthogonal Vectors problem as an intermediate problem, which we believe to be of independent interest.
In this paper we study a wide range of variants for computing the (discrete and continuous) Frechet distance between uncertain curves. We define an uncertain curve as a sequence of uncertainty regions, where each region is a disk, a line segment, or a set of points. A realisation of a curve is a polyline connecting one point from each region. Given an uncertain curve and a second (certain or uncertain) curve, we seek to compute the lower and upper bound Frechet distance, which are the minimum and maximum Frechet distance for any realisations of the curves. We prove that both the upper and lower bound problems are NP-hard for the continuous Frechet distance in several uncertainty models, and that the upper bound problem remains hard for the discrete Frechet distance. In contrast, the lower bound (discrete and continuous) Frechet distance can be computed in polynomial time. Furthermore, we show that computing the expected discrete Frechet distance is #P-hard when the uncertainty regions are modelled as point sets or line segments. The construction also extends to show #P-hardness for computing the continuous Frechet distance when regions are modelled as point sets. On the positive side, we argue that in any constant dimension there is a FPTAS for the lower bound problem when $Delta / delta$ is polynomially bounded, where $delta$ is the Frechet distance and $Delta$ bounds the diameter of the regions. We then argue there is a near-linear-time 3-approximation for the decision problem when the regions are convex and roughly $delta$-separated. Finally, we also study the setting with Sakoe--Chiba time bands, where we restrict the alignment between the two curves, and give polynomial-time algorithms for upper bound and expected discrete and continuous Frechet distance for uncertainty regions modelled as point sets.
The Frechet distance is a popular distance measure for curves which naturally lends itself to fundamental computational tasks, such as clustering, nearest-neighbor searching, and spherical range searching in the corresponding metric space. However, its inherent complexity poses considerable computational challenges in practice. To address this problem we study distortion of the probabilistic embedding that results from projecting the curves to a randomly chosen line. Such an embedding could be used in combination with, e.g. locality-sensitive hashing. We show that in the worst case and under reasonable assumptions, the discrete Frechet distance between two polygonal curves of complexity $t$ in $mathbb{R}^d$, where $dinlbrace 2,3,4,5rbrace$, degrades by a factor linear in $t$ with constant probability. We show upper and lower bounds on the distortion. We also evaluate our findings empirically on a benchmark data set. The preliminary experimental results stand in stark contrast with our lower bounds. They indicate that highly distorted projections happen very rarely in practice, and only for strongly conditioned input curves. Keywords: Frechet distance, metric embeddings, random projections
The Frechet distance is a popular similarity measure between curves. For some applications, it is desirable to match the curves under translation before computing the Frechet distance between them. This variant is called the Translation Invariant Frechet distance, and algorithms to compute it are well studied. The query version, finding an optimal placement in the plane for a query segment where the Frechet distance becomes minimized, is much less well understood. We study Translation Invariant Frechet distance queries in a restricted setting of horizontal query segments. More specifically, we preprocess a trajectory in $mathcal O(n^2 log^2 n) $ time and $mathcal O(n^{3/2})$ space, such that for any subtrajectory and any horizontal query segment we can compute their Translation Invariant Frechet distance in $mathcal O(text{polylog } n)$ time. We hope this will be a step towards answering Translation Invariant Frechet queries between arbitrary trajectories.
The Vapnik-Chervonenkis dimension provides a notion of complexity for systems of sets. If the VC dimension is small, then knowing this can drastically simplify fundamental computational tasks such as classification, range counting, and density estimation through the use of sampling bounds. We analyze set systems where the ground set $X$ is a set of polygonal curves in $mathbb{R}^d$ and the sets $mathcal{R}$ are metric balls defined by curve similarity metrics, such as the Frechet distance and the Hausdorff distance, as well as their discrete counterparts. We derive upper and lower bounds on the VC dimension that imply useful sampling bounds in the setting that the number of curves is large, but the complexity of the individual curves is small. Our upper bounds are either near-quadratic or near-linear in the complexity of the curves that define the ranges and they are logarithmic in the complexity of the curves that define the ground set.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا