Dynamic Partition Bloom Filters: A Bounded False Positive Solution For Dynamic Set Membership (Extended Abstract)

361 0 0.0 ( 0 )

Download Cite

Added by Amitabha Bagchi

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Sidharth Negi - Ameya Dubey - Amitabha Bagchi

Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Dynamic Bloom filters (DBF) were proposed by Guo et. al. in 2010 to tackle the situation where the size of the set to be stored compactly is not known in advance or can change during the course of the application. We propose a novel competitor to DBF with the following important property that DBF is not able to achieve: our structure is able to maintain a bound on the false positive rate for the set membership query across all possible sizes of sets that are stored in it. The new data structure we propose is a dynamic structure that we call Dynamic Partition Bloom filter (DPBF). DPBF is based on our novel concept of a Bloom partition tree which is a tree structure with standard Bloom filters at the leaves. DPBF is superior to standard Bloom filters because it can efficiently handle a large number of unions and intersections of sets of different sizes while controlling the false positive rate. This makes DPBF the first structure to do so to the best of our knowledge. We provide theoretical bounds comparing the false positive probability of DPBF to DBF.

rate research

Age-Partitioned Bloom Filters

79 - Ariel Shtul , Carlos Baquero , Paulo Sergio Almeida 2020

Bloom filters (BF) are widely used for approximate membership queries over a set of elements. BF variants allow removals, sets of unbounded size or querying a sliding window over an unbounded stream. However, for this last case the best current approaches are dictionary based (e.g., based on Cuckoo Filters or TinyTable), and it may seem that BF-based approaches will never be competitive to dictionary-based ones. In this paper we present Age-Partitioned Bloom Filters, a BF-based approach for duplicate detection in sliding windows that not only is competitive in time-complexity, but has better space usage than current dictionary-based approaches (e.g., SWAMP), at the cost of some moderate slack. APBFs retain the BF simplicity, unlike dictionary-based approaches, important for hardware-based implementations, and can integrate known improvements such as double hashing or blocking. We present an Age-Partitioned Blocked Bloom Filter variant which can operate with 2-3 cache-line accesses per insertion and around 2-4 per query, even for high accuracy filters.

Data Structures and Algorithms Databases Distributed Parallel and Cluster Computing

Sampling and Reconstruction Using Bloom Filters

92 - Neha Sengupta , Amitabha Bagchi , Srikanta Bedathur 2017

In this paper, we address the problem of sampling from a set and reconstructing a set stored as a Bloom filter. To the best of our knowledge our work is the first to address this question. We introduce a novel hierarchical data structure called BloomSampleTree that helps us design efficient algorithms to extract an almost uniform sample from the set stored in a Bloom filter and also allows us to reconstruct the set efficiently. In the case where the hash functions used in the Bloom filter implementation are partially invertible, in the sense that it is easy to calculate the set of elements that map to a particular hash value, we propose a second, more space-efficient method called HashInvert for the reconstruction. We study the properties of these two methods both analytically as well as experimentally. We provide bounds on run times for both methods and sample quality for the BloomSampleTree based algorithm, and show through an extensive experimental evaluation that our methods are efficient and effective.

Data Structures and Algorithms

A New Deterministic Algorithm for Dynamic Set Cover

100 - Sayan Bhattacharya , Monika Henzinger , Danupon Nanongkai 2019

We present a deterministic dynamic algorithm for maintaining a $(1+epsilon)f$-approximate minimum cost set cover with $O(flog(Cn)/epsilon^2)$ amortized update time, when the input set system is undergoing element insertions and deletions. Here, $n$ denotes the number of elements, each element appears in at most $f$ sets, and the cost of each set lies in the range $[1/C, 1]$. Our result, together with that of Gupta et al. [STOC`17], implies that there is a deterministic algorithm for this problem with $O(flog(Cn))$ amortized update time and $O(min(log n, f))$-approximation ratio, which nearly matches the polynomial-time hardness of approximation for minimum set cover in the static setting. Our update time is only $O(log (Cn))$ away from a trivial lower bound. Prior to our work, the previous best approximation ratio guaranteed by deterministic algorithms was $O(f^2)$, which was due to Bhattacharya et al. [ICALP`15]. In contrast, the only result that guaranteed $O(f)$-approximation was obtained very recently by Abboud et al. [STOC`19], who designed a dynamic algorithm with $(1+epsilon)f$-approximation ratio and $O(f^2 log n/epsilon)$ amortized update time. Besides the extra $O(f)$ factor in the update time compared to our and Gupta et al.s results, the Abboud et al. algorithm is randomized, and works only when the adversary is oblivious and the sets are unweighted (each set has the same cost). We achieve our result via the primal-dual approach, by maintaining a fractional packing solution as a dual certificate. Unlike previous primal-dual algorithms that try to satisfy some local constraints for individual sets at all time, our algorithm basically waits until the dual solution changes significantly globally, and fixes the solution only where the fix is needed.

Data Structures and Algorithms

Dynamic Geometric Independent Set

300 - Sujoy Bhore , Jean Cardinal , John Iacono 2020

We present fully dynamic approximation algorithms for the Maximum Independent Set problem on several types of geometric objects: intervals on the real line, arbitrary axis-aligned squares in the plane and axis-aligned $d$-dimensional hypercubes. It is known that a maximum independent set of a collection of $n$ intervals can be found in $O(nlog n)$ time, while it is already textsf{NP}-hard for a set of unit squares. Moreover, the problem is inapproximable on many important graph families, but admits a textsf{PTAS} for a set of arbitrary pseudo-disks. Therefore, a fundamental question in computational geometry is whether it is possible to maintain an approximate maximum independent set in a set of dynamic geometric objects, in truly sublinear time per insertion or deletion. In this work, we answer this question in the affirmative for intervals, squares and hypercubes. First, we show that for intervals a $(1+varepsilon)$-approximate maximum independent set can be maintained with logarithmic worst-case update time. This is achieved by maintaining a locally optimal solution using a constant number of constant-size exchanges per update. We then show how our interval structure can be used to design a data structure for maintaining an expected constant factor approximate maximum independent set of axis-aligned squares in the plane, with polylogarithmic amortized update time. Our approach generalizes to $d$-dimensional hypercubes, providing a $O(4^d)$-approximation with polylogarithmic update time. Those are the first approximation algorithms for any set of dynamic arbitrary size geometric objects; previous results required bounded size ratios to obtain polylogarithmic update time. Furthermore, it is known that our results for squares (and hypercubes) cannot be improved to a $(1+varepsilon)$-approximation with the same update time.

Data Structures and Algorithms Computational Geometry

Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters

99 - Thomas Mueller Graf , Daniel Lemire 2019

The Bloom filter provides fast approximate set membership while using little memory. Engineers often use these filters to avoid slow operations such as disk or network accesses. As an alternative, a cuckoo filter may need less space than a Bloom filter and it is faster. Chazelle et al. proposed a generalization of the Bloom filter called the Bloomier filter. Dietzfelbinger and Pagh described a variation on the Bloomier filter that can be used effectively for approximate membership queries. It has never been tested empirically, to our knowledge. We review an efficient implementation of their approach, which we call the xor filter. We find that xor filters can be faster than Bloom and cuckoo filters while using less memory. We further show that a more compact version of xor filters (xor+) can use even less space than highly compact alternatives (e.g., Golomb-compressed sequences) while providing speeds competitive with Bloom filters.

Data Structures and Algorithms