ترغب بنشر مسار تعليمي؟ اضغط هنا

Dynamic Partition Bloom Filters: A Bounded False Positive Solution For Dynamic Set Membership (Extended Abstract)

361   0   0.0 ( 0 )
 نشر من قبل Amitabha Bagchi
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Dynamic Bloom filters (DBF) were proposed by Guo et. al. in 2010 to tackle the situation where the size of the set to be stored compactly is not known in advance or can change during the course of the application. We propose a novel competitor to DBF with the following important property that DBF is not able to achieve: our structure is able to maintain a bound on the false positive rate for the set membership query across all possible sizes of sets that are stored in it. The new data structure we propose is a dynamic structure that we call Dynamic Partition Bloom filter (DPBF). DPBF is based on our novel concept of a Bloom partition tree which is a tree structure with standard Bloom filters at the leaves. DPBF is superior to standard Bloom filters because it can efficiently handle a large number of unions and intersections of sets of different sizes while controlling the false positive rate. This makes DPBF the first structure to do so to the best of our knowledge. We provide theoretical bounds comparing the false positive probability of DPBF to DBF.



قيم البحث

اقرأ أيضاً

Bloom filters (BF) are widely used for approximate membership queries over a set of elements. BF variants allow removals, sets of unbounded size or querying a sliding window over an unbounded stream. However, for this last case the best current appro aches are dictionary based (e.g., based on Cuckoo Filters or TinyTable), and it may seem that BF-based approaches will never be competitive to dictionary-based ones. In this paper we present Age-Partitioned Bloom Filters, a BF-based approach for duplicate detection in sliding windows that not only is competitive in time-complexity, but has better space usage than current dictionary-based approaches (e.g., SWAMP), at the cost of some moderate slack. APBFs retain the BF simplicity, unlike dictionary-based approaches, important for hardware-based implementations, and can integrate known improvements such as double hashing or blocking. We present an Age-Partitioned Blocked Bloom Filter variant which can operate with 2-3 cache-line accesses per insertion and around 2-4 per query, even for high accuracy filters.
In this paper, we address the problem of sampling from a set and reconstructing a set stored as a Bloom filter. To the best of our knowledge our work is the first to address this question. We introduce a novel hierarchical data structure called Bloom SampleTree that helps us design efficient algorithms to extract an almost uniform sample from the set stored in a Bloom filter and also allows us to reconstruct the set efficiently. In the case where the hash functions used in the Bloom filter implementation are partially invertible, in the sense that it is easy to calculate the set of elements that map to a particular hash value, we propose a second, more space-efficient method called HashInvert for the reconstruction. We study the properties of these two methods both analytically as well as experimentally. We provide bounds on run times for both methods and sample quality for the BloomSampleTree based algorithm, and show through an extensive experimental evaluation that our methods are efficient and effective.
We present a deterministic dynamic algorithm for maintaining a $(1+epsilon)f$-approximate minimum cost set cover with $O(flog(Cn)/epsilon^2)$ amortized update time, when the input set system is undergoing element insertions and deletions. Here, $n$ d enotes the number of elements, each element appears in at most $f$ sets, and the cost of each set lies in the range $[1/C, 1]$. Our result, together with that of Gupta et al. [STOC`17], implies that there is a deterministic algorithm for this problem with $O(flog(Cn))$ amortized update time and $O(min(log n, f))$-approximation ratio, which nearly matches the polynomial-time hardness of approximation for minimum set cover in the static setting. Our update time is only $O(log (Cn))$ away from a trivial lower bound. Prior to our work, the previous best approximation ratio guaranteed by deterministic algorithms was $O(f^2)$, which was due to Bhattacharya et al. [ICALP`15]. In contrast, the only result that guaranteed $O(f)$-approximation was obtained very recently by Abboud et al. [STOC`19], who designed a dynamic algorithm with $(1+epsilon)f$-approximation ratio and $O(f^2 log n/epsilon)$ amortized update time. Besides the extra $O(f)$ factor in the update time compared to our and Gupta et al.s results, the Abboud et al. algorithm is randomized, and works only when the adversary is oblivious and the sets are unweighted (each set has the same cost). We achieve our result via the primal-dual approach, by maintaining a fractional packing solution as a dual certificate. Unlike previous primal-dual algorithms that try to satisfy some local constraints for individual sets at all time, our algorithm basically waits until the dual solution changes significantly globally, and fixes the solution only where the fix is needed.
We present fully dynamic approximation algorithms for the Maximum Independent Set problem on several types of geometric objects: intervals on the real line, arbitrary axis-aligned squares in the plane and axis-aligned $d$-dimensional hypercubes. It is known that a maximum independent set of a collection of $n$ intervals can be found in $O(nlog n)$ time, while it is already textsf{NP}-hard for a set of unit squares. Moreover, the problem is inapproximable on many important graph families, but admits a textsf{PTAS} for a set of arbitrary pseudo-disks. Therefore, a fundamental question in computational geometry is whether it is possible to maintain an approximate maximum independent set in a set of dynamic geometric objects, in truly sublinear time per insertion or deletion. In this work, we answer this question in the affirmative for intervals, squares and hypercubes. First, we show that for intervals a $(1+varepsilon)$-approximate maximum independent set can be maintained with logarithmic worst-case update time. This is achieved by maintaining a locally optimal solution using a constant number of constant-size exchanges per update. We then show how our interval structure can be used to design a data structure for maintaining an expected constant factor approximate maximum independent set of axis-aligned squares in the plane, with polylogarithmic amortized update time. Our approach generalizes to $d$-dimensional hypercubes, providing a $O(4^d)$-approximation with polylogarithmic update time. Those are the first approximation algorithms for any set of dynamic arbitrary size geometric objects; previous results required bounded size ratios to obtain polylogarithmic update time. Furthermore, it is known that our results for squares (and hypercubes) cannot be improved to a $(1+varepsilon)$-approximation with the same update time.
The Bloom filter provides fast approximate set membership while using little memory. Engineers often use these filters to avoid slow operations such as disk or network accesses. As an alternative, a cuckoo filter may need less space than a Bloom filt er and it is faster. Chazelle et al. proposed a generalization of the Bloom filter called the Bloomier filter. Dietzfelbinger and Pagh described a variation on the Bloomier filter that can be used effectively for approximate membership queries. It has never been tested empirically, to our knowledge. We review an efficient implementation of their approach, which we call the xor filter. We find that xor filters can be faster than Bloom and cuckoo filters while using less memory. We further show that a more compact version of xor filters (xor+) can use even less space than highly compact alternatives (e.g., Golomb-compressed sequences) while providing speeds competitive with Bloom filters.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا