ﻻ يوجد ملخص باللغة العربية
This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of $n$ points $A,B$ in $mathbb{R}^d$, a metric $phi(cdot)$, and a distance threshold $r > 0$, report all pairs of points $(a, b) in A times B$ with $phi(a,b) le r$. Our goal is to store $A,B$ into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from $A$ or $B$. We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for $ell_1, ell_infty$ metrics with $log^{O(1)} n$ update time and delay. We show that such a data structure is not feasible for the $ell_2$ metric for $d ge 4$. For approximate enumeration of similarity join, where the distance threshold is a soft constraint, we obtain a unified linear-size data structure for $ell_p$ metric, with $log^{O(1)} n$ delay and update time. In high dimensions, we present an efficient data structure with worst-case delay-guarantee using locality sensitive hashing (LSH).
The Jaccard index is an important similarity measure for item sets and Boolean data. On large datasets, an exact similarity computation is often infeasible for all item pairs both due to time and space constraints, giving rise to faster approximate m
Encryption provides a method to protect data outsourced to a DBMS provider, e.g., in the cloud. However, performing database operations over encrypted data requires specialized encryption schemes that carefully balance security and performance. In th
An enumeration kernel as defined by Creignou et al. [Theory Comput. Syst. 2017] for a parameterized enumeration problem consists of an algorithm that transforms each instance into one whose size is bounded by the parameter plus a solution-lifting alg
A tree decomposition of a graph facilitates computations by grouping vertices into bags that are interconnected in an acyclic structure, hence their importance in a plethora of problems such as query evaluation over databases and inference over proba
We study the problem of computing similarity joins under edit distance on a set of strings. Edit similarity joins is a fundamental problem in databases, data mining and bioinformatics. It finds important applications in data cleaning and integration,