Succinct Filters for Sets of Unknown Sizes


Abstract in English

The membership problem asks to maintain a set $Ssubseteq[u]$, supporting insertions and membership queries, i.e., testing if a given element is in the set. A data structure that computes exact answers is called a dictionary. When a (small) false positive rate $epsilon$ is allowed, the data structure is called a filter. The space usages of the standard dictionaries or filters usually depend on the upper bound on the size of $S$, while the actual set can be much smaller. Pagh, Segev and Wieder (FOCS13) were the first to study filters with varying space usage based on the current $|S|$. They showed in order to match the space with the current set size $n=|S|$, any filter data structure must use $(1-o(1))n(log(1/epsilon)+(1-O(epsilon))loglog n)$ bits, in contrast to the well-known lower bound of $Nlog(1/epsilon)$ bits, where $N$ is an upper bound on $|S|$. They also presented a data structure with almost optimal space of $(1+o(1))n(log(1/epsilon)+O(loglog n))$ bits provided that $n>u^{0.001}$, with expected amortized constant insertion time and worst-case constant lookup time. In this work, we present a filter data structure with improvements in two aspects: - it has constant worst-case time for all insertions and lookups with high probability; - it uses space $(1+o(1))n(log (1/epsilon)+loglog n)$ bits when $n>u^{0.001}$, achieving optimal leading constant for all $epsilon=o(1)$. We also present a dictionary that uses $(1+o(1))nlog(u/n)$ bits of space, matching the optimal space in terms of the current size, and performs all operations in constant time with high probability.

Download