Threshold Queries in Theory and in the Wild

103 0 0.0 ( 0 )

Download Cite

Added by Matthias Hofer

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Angela Bonifati - Stefania Dumbrava - George Fletcher

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. In this paper, we present a deep theoretical analysis of threshold query evaluation and show that thresholds can be used to significantly improve the asymptotic bounds of state-of-the-art query evaluation algorithms. We also empirically show that threshold queries are significant in practice. In surprising contrast to conventional wisdom, we found important scenarios in real-world data sets in which users are interested in computing the results of queries up to a certain threshold, independent of a ranking function that orders the query results by importance.

rate research

Preference Elicitation in Prioritized Skyline Queries

693 - Denis Mindolin , Jan Chomicki 2010

Preference queries incorporate the notion of binary preference relation into relational database querying. Instead of returning all the answers, such queries return only the best answers, according to a given preference relation. Preference queries are a fast growing area of database research. Skyline queries constitute one of the most thoroughly studied classes of preference queries. A well known limitation of skyline queries is that skyline preference relations assign the same importance to all attributes. In this work, we study p-skyline queries that generalize skyline queries by allowing varying attribute importance in preference relations. We perform an in-depth study of the properties of p-skyline preference relations. In particular,we study the problems of containment and minimal extension. We apply the obtained results to the central problem of the paper: eliciting relative importance of attributes. Relative importance is implicit in the constructed p-skyline preference relation. The elicitation is based on user-selected sets of superior (positive) and inferior (negative) examples. We show that the computational complexity of elicitation depends on whether inferior examples are involved. If they are not, elicitation can be achieved in polynomial time. Otherwise, it is NP-complete. Our experiments show that the proposed elicitation algorithm has high accuracy and good scalability

Databases

Semantics and Evaluation of Top-k Queries in Probabilistic Databases

486 - Xi Zhang , Jan Chomicki 2009

We study here fundamental issues involved in top-k query evaluation in probabilistic databases. We consider simple probabilistic databases in which probabilities are associated with individual tuples, and general probabilistic databases in which, additionally, exclusivity relationships between tuples can be represented. In contrast to other recent research in this area, we do not limit ourselves to injective scoring functions. We formulate three intuitive postulates that the semantics of top-k queries in probabilistic databases should satisfy, and introduce a new semantics, Global-Topk, that satisfies those postulates to a large degree. We also show how to evaluate queries under the Global-Topk semantics. For simple databases we design dynamic-programming based algorithms, and for general databases we show polynomial-time reductions to the simple cases. For example, we demonstrate that for a fixed k the time complexity of top-k query evaluation is as low as linear, under the assumption that probabilistic databases are simple and scoring functions are injective.

Databases

Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries

162 - Ahmet Kara , Milos Nikolic , Dan Olteanu 2019

We investigate trade-offs in static and dynamic evaluation of hierarchical queries with arbitrary free variables. In the static setting, the trade-off is between the time to partially compute the query result and the delay needed to enumerate its tuples. In the dynamic setting, we additionally consider the time needed to update the query result in the presence of single-tuple inserts and deletes to the input database. Our approach observes the degree of values in the database and uses different computation and maintenance strategies for high-degree and low-degree values. For the latter it partially computes the result, while for the former it computes enough information to allow for on-the-fly enumeration. The main result of this work defines the preprocessing time, the update time, and the enumeration delay as functions of the light/heavy threshold and of the factorization width of the hierarchical query. By conveniently choosing this threshold, our approach can recover a number of prior results when restricted to hierarchical queries. For a restricted class of hierarchical queries, our approach can achieve worst-case optimal update time and enumeration delay conditioned on the Online Matrix-Vector Multiplication Conjecture.

Databases

Monotonic Properties of Completed Aggregates in Recursive Queries

215 - Carlo Zaniolo , Ariyam Das , Jiaqi Gu 2019

The use of aggregates in recursion enables efficient and scalable support for a wide range of BigData algorithms, including those used in graph applications, KDD applications, and ML applications, which have proven difficult to be expressed and supported efficiently in BigData systems supporting Datalog or SQL. The problem with these languages and systems is that, to avoid the semantic and computational issues created by non-monotonic constructs in recursion, they only allow programs that are stratified with respect to negation and aggregates. Now, while this crippling restriction is well-justified for negation, it is frequently unjustified for aggregates, since (i) aggregates are often monotonic in the standard lattice of set-containment, (ii) the PreM property guarantees that programs with extrema in recursion are equivalent to stratified programs where extrema are used as post-constraints, and (iii) any program computing any aggregates on sets of facts of predictable cardinality tantamounts to stratified programs where the precomputation of the cardinality of the set is followed by a stratum where recursive rules only use monotonic constructs. With (i) and (ii) covered in previous papers, this paper focuses on (iii) using examples of great practical interest. For such examples, we provide a formal semantics that is conducive to efficient and scalable implementations via well-known techniques such as semi-naive fixpoint currently supported by most Datalog and SQL3 systems.

Databases

Assessing Achievability of Queries and Constraints

204 - Rada Chirkova , Jon Doyle , Juan L. Reutter 2017

Assessing and improving the quality of data in data-intensive systems are fundamental challenges that have given rise to numerous applications targeting transformation and cleaning of data. However, while schema design, data cleaning, and data migration are nowadays reasonably well understood in isolation, not much attention has been given to the interplay between the tools that address issues in these areas. Our focus is on the problem of determining whether there exist sequences of data-transforming procedures that, when applied to the (untransformed) input data, would yield data satisfying the conditions required for performing the task in question. Our goal is to develop a framework that would address this problem, starting with the relational setting. In this paper we abstract data-processing tools as black-box procedures. This abstraction describes procedures by a specification of which parts of the database might be modified by the procedure, as well as by the constraints that specify the required states of the database before and after applying the procedure. We then proceed to study fundamental algorithmic questions arising in this context, such as understanding when one can guarantee that sequences of procedures apply to original or transformed data, when they succeed at improving the data, and when knowledge bases can represent the outcomes of procedures. Finally, we turn to the problem of determining whether the application of a sequence of procedures to a database results in the satisfaction of properties specified by either queries or constraints. We show that this problem is decidable for some broad and realistic classes of procedures and properties, even when procedures are allowed to alter the schema of instances.

Databases