No Arabic abstract
In this paper we study a wide range of variants for computing the (discrete and continuous) Frechet distance between uncertain curves. We define an uncertain curve as a sequence of uncertainty regions, where each region is a disk, a line segment, or a set of points. A realisation of a curve is a polyline connecting one point from each region. Given an uncertain curve and a second (certain or uncertain) curve, we seek to compute the lower and upper bound Frechet distance, which are the minimum and maximum Frechet distance for any realisations of the curves. We prove that both the upper and lower bound problems are NP-hard for the continuous Frechet distance in several uncertainty models, and that the upper bound problem remains hard for the discrete Frechet distance. In contrast, the lower bound (discrete and continuous) Frechet distance can be computed in polynomial time. Furthermore, we show that computing the expected discrete Frechet distance is #P-hard when the uncertainty regions are modelled as point sets or line segments. The construction also extends to show #P-hardness for computing the continuous Frechet distance when regions are modelled as point sets. On the positive side, we argue that in any constant dimension there is a FPTAS for the lower bound problem when $Delta / delta$ is polynomially bounded, where $delta$ is the Frechet distance and $Delta$ bounds the diameter of the regions. We then argue there is a near-linear-time 3-approximation for the decision problem when the regions are convex and roughly $delta$-separated. Finally, we also study the setting with Sakoe--Chiba time bands, where we restrict the alignment between the two curves, and give polynomial-time algorithms for upper bound and expected discrete and continuous Frechet distance for uncertainty regions modelled as point sets.
The Frechet distance is a popular similarity measure between curves. For some applications, it is desirable to match the curves under translation before computing the Frechet distance between them. This variant is called the Translation Invariant Frechet distance, and algorithms to compute it are well studied. The query version, finding an optimal placement in the plane for a query segment where the Frechet distance becomes minimized, is much less well understood. We study Translation Invariant Frechet distance queries in a restricted setting of horizontal query segments. More specifically, we preprocess a trajectory in $mathcal O(n^2 log^2 n) $ time and $mathcal O(n^{3/2})$ space, such that for any subtrajectory and any horizontal query segment we can compute their Translation Invariant Frechet distance in $mathcal O(text{polylog } n)$ time. We hope this will be a step towards answering Translation Invariant Frechet queries between arbitrary trajectories.
The Frechet distance is a popular distance measure for curves which naturally lends itself to fundamental computational tasks, such as clustering, nearest-neighbor searching, and spherical range searching in the corresponding metric space. However, its inherent complexity poses considerable computational challenges in practice. To address this problem we study distortion of the probabilistic embedding that results from projecting the curves to a randomly chosen line. Such an embedding could be used in combination with, e.g. locality-sensitive hashing. We show that in the worst case and under reasonable assumptions, the discrete Frechet distance between two polygonal curves of complexity $t$ in $mathbb{R}^d$, where $dinlbrace 2,3,4,5rbrace$, degrades by a factor linear in $t$ with constant probability. We show upper and lower bounds on the distortion. We also evaluate our findings empirically on a benchmark data set. The preliminary experimental results stand in stark contrast with our lower bounds. They indicate that highly distorted projections happen very rarely in practice, and only for strongly conditioned input curves. Keywords: Frechet distance, metric embeddings, random projections
We study approximate-near-neighbor data structures for time series under the continuous Frechet distance. For an attainable approximation factor $c>1$ and a query radius $r$, an approximate-near-neighbor data structure can be used to preprocess $n$ curves in $mathbb{R}$ (aka time series), each of complexity $m$, to answer queries with a curve of complexity $k$ by either returning a curve that lies within Frechet distance $cr$, or answering that there exists no curve in the input within distance $r$. In both cases, the answer is correct. Our first data structure achieves a $(5+epsilon)$ approximation factor, uses space in $ncdot mathcal{O}left({epsilon^{-1}}right)^{k} + mathcal{O}(nm)$ and has query time in $mathcal{O}left(kright)$. Our second data structure achieves a $(2+epsilon)$ approximation factor, uses space in $ncdot mathcal{O}left(frac{m}{kepsilon}right)^{k} + mathcal{O}(nm)$ and has query time in $mathcal{O}left(kcdot 2^kright)$. Our third positive result is a probabilistic data structure based on locality-sensitive hashing, which achieves space in $mathcal{O}(nlog n+nm)$ and query time in $mathcal{O}(klog n)$, and which answers queries with an approximation factor in $mathcal{O}(k)$. All of our data structures make use of the concept of signatures, which were originally introduced for the problem of clustering time series under the Frechet distance. In addition, we show lower bounds for this problem. Consider any data structure which achieves an approximation factor less than $2$ and which supports curves of arclength up to $L$ and answers the query using only a constant number of probes. We show that under reasonable assumptions on the word size any such data structure needs space in $L^{Omega(k)}$.
We study the $c$-approximate near neighbor problem under the continuous Frechet distance: Given a set of $n$ polygonal curves with $m$ vertices, a radius $delta > 0$, and a parameter $k leq m$, we want to preprocess the curves into a data structure that, given a query curve $q$ with $k$ vertices, either returns an input curve with Frechet distance at most $ccdot delta$ to $q$, or returns that there exists no input curve with Frechet distance at most $delta$ to $q$. We focus on the case where the input and the queries are one-dimensional polygonal curves -- also called time series -- and we give a comprehensive analysis for this case. We obtain new upper bounds that provide different tradeoffs between approximation factor, preprocessing time, and query time. Our data structures improve upon the state of the art in several ways. We show that for any $0 < varepsilon leq 1$ an approximation factor of $(1+varepsilon)$ can be achieved within the same asymptotic time bounds as the previously best result for $(2+varepsilon)$. Moreover, we show that an approximation factor of $(2+varepsilon)$ can be obtained by using preprocessing time and space $O(nm)$, which is linear in the input size, and query time in $O(frac{1}{varepsilon})^{k+2}$, where the previously best result used preprocessing time in $n cdot O(frac{m}{varepsilon k})^k$ and query time in $O(1)^k$. We complement our upper bounds with matching conditional lower bounds based on the Orthogonal Vectors Hypothesis. Interestingly, some of our lower bounds already hold for any super-constant value of $k$. This is achieved by proving hardness of a one-sided sparse version of the Orthogonal Vectors problem as an intermediate problem, which we believe to be of independent interest.
We study metric data structures for curves in doubling spaces, such as trajectories of moving objects in Euclidean $mathbb{R}^d$, where the distance between two curves is measured using the discrete Frechet distance. We design data structures in an emph{asymmetric} setting where the input is a curve (or a set of $n$ curves) each of complexity $m$ and the queries are with curves of complexity $kll m$. We show that there exist approximate data structures that are independent of the input size $N = d cdot n cdot m$ and we study how to maintain them dynamically if the input is given in the stream. Concretely, we study two types of data structures: (i) distance oracles, where the task is to store a compressed version of the input curve, which can be used to answer queries for the distance of a query curve to the input curve, and (ii) nearest-neighbor data structures, where the task is to preprocess a set of input curves to answer queries for the input curve closest to the query curve. In both cases we are interested in approximation. For curves embedded in Euclidean $mathbb{R}^d$ with constant $d$, our distance oracle uses space in $mathcal{O}((k log(epsilon^{-1}) epsilon^{-d})^k)$ ($epsilon$ is the precision parameter). The oracle performs $(1+epsilon)$-approximate queries in time in $mathcal{O}(k^2)$ and is deterministic. We show how to maintain this distance oracle in the stream using polylogarithmic additional memory. In the stream, we can dynamically answer distance queries to the portion of the stream seen so far in $mathcal{O}(k^4 log^2 m)$ time. We apply our techniques to the second problem, approximate near neighbor (ANN) data structures, and achieve an exponential improvement in the dependency on the complexity of the input curves compared to the state of the art.