ترغب بنشر مسار تعليمي؟ اضغط هنا

Certifying Certainty and Uncertainty in Approximate Membership Query Structures -- Extended Version

77   0   0.0 ( 0 )
 نشر من قبل Ilya Sergey
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Approximate Membership Query structures (AMQs) rely on randomisation for time- and space-efficiency, while introducing a possibility of false positive and false negative answers. Correctness proofs of such structures involve subtle reasoning about bounds on probabilities of getting certain outcomes. Because of these subtleties, a number of unsound arguments in such proofs have been made over the years. In this work, we address the challenge of building rigorous and reusable computer-assisted proofs about probabilistic specifications of AMQs. We describe the framework for systematic decomposition of AMQs and their properties into a series of interfaces and reusable components. We implement our framework as a library in the Coq proof assistant and showcase it by encoding in it a number of non-trivial AMQs, such as Bloom filters, counting filters, quotient filters and blocked constructions, and mechanising the proofs of their probabilistic specifications. We demonstrate how AMQs encoded in our framework guarantee the absence of false negatives by construction. We also show how the proofs about probabilities of false positives for complex AMQs can be obtained by means of verified reduction to the implementations of their simpler counterparts. Finally, we provide a library of domain-specific theorems and tactics that allow a high degree of automation in probabilistic proofs.



قيم البحث

اقرأ أيضاً

A retrieval data structure for a static function $f:Srightarrow {0,1}^r$ supports queries that return $f(x)$ for any $x in S$. Retrieval data structures can be used to implement a static approximate membership query data structure (AMQ) (i.e., a Bloo m filter alternative) with false positive rate $2^{-r}$. The information-theoretic lower bound for both tasks is $r|S|$ bits. While succinct theoretical constructions using $(1+o(1))r|S|$ bits were known, these could not achieve very small overheads in practice because they have an unfavorable space-time tradeoff hidden in the asymptotic costs or because small overheads would only be reached for physically impossible input sizes. With bumped ribbon retrieval (BuRR), we present the first practical succinct retrieval data structure. In an extensive experimental evaluation BuRR achieves space overheads well below $1,%$ while being faster than most previously used retrieval data structures (typically with space overheads at least an order of magnitude larger) and faster than classical Bloom filters (with space overhead $geq 44,%$). This efficiency, including favorable constants, stems from a combination of simplicity, word parallelism, and high locality. We additionally describe homogeneous ribbon filter AMQs, which are even simpler and faster at the price of slightly larger space overhead.
GraphQL is a query language for APIs and a runtime for executing those queries, fetching the requested data from existing microservices, REST APIs, databases, or other sources. Its expressiveness and its flexibility have made it an attractive candida te for API providers in many industries, especially through the web. A major drawback to blindly servicing a clients query in GraphQL is that the cost of a query can be unexpectedly large, creating computation and resource overload for the provider, and API rate-limit overages and infrastructure overload for the client. To mitigate these drawbacks, it is necessary to efficiently estimate the cost of a query before executing it. Estimating query cost is challenging, because GraphQL queries have a nested structure, GraphQL APIs follow different design conventions, and the underlying data sources are hidden. Estimates based on worst-case static query analysis have had limited success because they tend to grossly overestimate cost. We propose a machine-learning approach to efficiently and accurately estimate the query cost. We also demonstrate the power of this approach by testing it on query-response data from publicly available commercial APIs. Our framework is efficient and predicts query costs with high accuracy, consistently outperforming the static analysis by a large margin.
We study the problem of sorting under incomplete information, when queries are used to resolve uncertainties. Each of $n$ data items has an unknown value, which is known to lie in a given interval. We can pay a query cost to learn the actual value, a nd we may allow an error threshold in the sorting. The goal is to find a nearly-sorted permutation by performing a minimum-cost set of queries. We show that an offline optimum query set can be found in poly time, and that both oblivious and adaptive problems have simple competitive algorithms. The competitive ratio for the oblivious problem is $n$ for uniform query costs, and unbounded for arbitrary costs; for the adaptive problem, the ratio is 2. We present a unified adaptive strategy for uniform costs that yields the following improved results: (1) a 3/2-competitive randomized algorithm; (2) a 5/3-competitive deterministic algorithm if the dependency graph has no 2-components after some preprocessing, which has competitive ratio $3/2+mathrm{O}(1/k)$ if the components obtained have size at least $k$; and (3) an exact algorithm for laminar families of intervals. The first two results have matching lower bounds, and we have a lower bound of 7/5 for large components. We also give a randomized adaptive algorithm with competitive ratio $1+frac{4}{3sqrt{3}}approx 1.7698$ for arbitrary query costs, and we show that the 2-competitive deterministic adaptive algorithm can be generalized for queries returning intervals and for a more general vertex cover problem, by using the local ratio technique. Moreover, we prove that the advice complexity of the adaptive problem is $lfloor n/2rfloor$ if no error threshold is allowed, and $lceil n/3cdotlg 3rceil$ for the general case. Finally, we present some graph-theoretical results on co-threshold tolerance graphs, and we discuss uncertainty variants of some classical interval problems.
81 - Steven Chaplick 2020
We study problems with stochastic uncertainty information on intervals for which the precise value can be queried by paying a cost. The goal is to devise an adaptive decision tree to find a correct solution to the problem in consideration while minim izing the expected total query cost. We show that, for the sorting problem, such a decision tree can be found in polynomial time. For the problem of finding the data item with minimum value, we have some evidence for hardness. This contradicts intuition, since the minimum problem is easier both in the online setting with adversarial inputs and in the offline verification setting. However, the stochastic assumption can be leveraged to beat both deterministic and randomized approximation lower bounds for the online setting.
Why and why-not provenance have been studied extensively in recent years. However, why-not provenance, and to a lesser degree why provenance, can be very large resulting in severe scalability and usability challenges. In this paper, we introduce a no vel approximate summarization technique for provenance which overcomes these challenges. Our approach uses patterns to encode (why-not) provenance concisely. We develop techniques for efficiently computing provenance summaries balancing informativeness, conciseness, and completeness. To achieve scalability, we integrate sampling techniques into provenance capture and summarization. Our approach is the first to scale to large datasets and to generate comprehensive and meaningful summaries.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا