Equivalence of SQL Queries in Presence of Embedded Dependencies

452 0 0.0 ( 0 )

Download Cite

Added by Rada Chirkova

Publication date 2009

fields Informatics Engineering

and research's language is English

Authors Rada Chirkova - Michael Genesereth

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider the problem of finding equivalent minimal-size reformulations of SQL queries in presence of embedded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ queries, the semantics of the SQL standard treat query answers as multisets (a.k.a. bags), whereas the stored relations may be treated either as sets, which is called bag-set semantics for query evaluation, or as bags, which is called bag semantics. (Under set semantics, both query answers and stored relations are treated as sets.) In the context of the above Query-Reformulation Problem, we develop a comprehensive framework for equivalence of CQ queries under bag and bag-set semantics in presence of embedded dependencies, and make a number of conceptual and technical contributions. Specifically, we develop equivalence tests for CQ queries in presence of arbitrary sets of embedded dependencies under bag and bag-set semantics, under the condition that chase [9] under set semantics (set-chase) on the inputs terminates. We also present equivalence tests for aggregate CQ queries in presence of embedded dependencies. We use our equivalence tests to develop sound and complete (whenever set-chase on the inputs terminates) algorithms for solving instances of the Query-Reformulation Problem with CQ queries under each of bag and bag-set semantics, as well as for instances of the problem with aggregate queries.

rate research

Schemaless Queries over Document Tables with Dependencies

129 - Mustafa Canim , Cristina Cornelio , Arun Iyengar 2019

Unstructured enterprise data such as reports, manuals and guidelines often contain tables. The traditional way of integrating data from these tables is through a two-step process of table detection/extraction and mapping the table layouts to an appropriate schema. This can be an expensive process. In this paper we show that by using semantic technologies (RDF/SPARQL and database dependencies) paired with a simple but powerful way to transform tables with non-relational layouts, it is possible to offer query answering services over these tables with minimal manual work or domain-specific mappings. Our method enables users to exploit data in tables embedded in documents with little effort, not only for simple retrieval queries, but also for structured queries that require joining multiple interrelated tables.

Databases Artificial Intelligence Information Retrieval

Obtaining Information about Queries behind Views and Dependencies

506 - Rada Chirkova , Ting Yu 2014

We consider the problems of finding and determining certain query answers and of determining containment between queries; each problem is formulated in presence of materialized views and dependencies under the closed-world assumption. We show a tight relationship between the problems in this setting. Further, we introduce algorithms for solving each problem for those inputs where all the queries and views are conjunctive, and the dependencies are embedded weakly acyclic. We also determine the complexity of each problem under the security-relevant complexity measure introduced by Zhang and Mendelzon in 2005. The problems studied in this paper are fundamental in ensuring correct specification of database access-control policies, in particular in case of fine-grained access control. Our approaches can also be applied in the areas of inference control, secure data publishing, and database auditing.

Databases Logic in Computer Science

QueryVis: Logic-based diagrams help users understand complicated SQL queries faster

57 - Aristotelis Leventidis , Jiahui Zhang , Cody Dunne 2020

Understanding the meaning of existing SQL queries is critical for code maintenance and reuse. Yet SQL can be hard to read, even for expert users or the original creator of a query. We conjecture that it is possible to capture the logical intent of queries in emph{automatically-generated visual diagrams} that can help users understand the meaning of queries faster and more accurately than SQL text alone. We present initial steps in that direction with visual diagrams that are based on the first-order logic foundation of SQL and can capture the meaning of deeply nested queries. Our diagrams build upon a rich history of diagrammatic reasoning systems in logic and were designed using a large body of human-computer interaction best practices: they are emph{minimal} in that no visual element is superfluous; they are emph{unambiguous} in that no two queries with different semantics map to the same visualization; and they emph{extend} previously existing visual representations of relational schemata and conjunctive queries in a natural way. An experimental evaluation involving 42 users on Amazon Mechanical Turk shows that with only a 2--3 minute static tutorial, participants could interpret queries meaningfully faster with our diagrams than when reading SQL alone. Moreover, we have evidence that our visual diagrams result in participants making fewer errors than with SQL. We believe that more regular exposure to diagrammatic representations of SQL can give rise to a emph{pattern-based} and thus more intuitive use and re-use of SQL. All details on the experimental study, the evaluation stimuli, raw data, and analyses, and source code are available at https://osf.io/mycr2

Databases Human-Computer Interaction Logic in Computer Science

Combined-Semantics Equivalence Is Decidable for a Practical Class of Conjunctive Queries

375 - Rada Chirkova 2013

In this paper, we focus on the problem of determining whether two conjunctive (CQ) queries posed on relational data are combined-semantics equivalent [9]. We continue the tradition of [2,5,9] of studying this problem using the tool of containment between queries. We introduce a syntactic necessary and sufficient condition for equivalence of queries belonging to a large natural language of explicit-wave combined-semantics CQ queries; this language encompasses (but is not limited to) all set, bag, and bag-set queries, and appears to cover all combined-semantics CQ queries that are expressible in SQL. Our result solves in the positive the decidability problem of determining combined-semantics equivalence for pairs of explicit-wave CQ queries. That is, for an arbitrary pair of combined-semantics CQ queries, it is decidable (i) to determine whether each of the queries is explicit wave, and (ii) to determine, in case both queries are explicit wave, whether or not they are combined-semantics equivalent, by using our syntactic criterion. (The problem of determining equivalence for general combined-semantics CQ queries remains open. Even so, our syntactic sufficient containment condition could still be used to determine that two general CQ queries are combined-semantics equivalent.) Our equivalence test, as well as our general sufficient condition for containment of combined-semantics CQ queries, reduce correctly to the special cases reported in [2,5] for set, bag, and bag-set semantics. Our containment and equivalence conditions also properly generalize the results of [9], provided that the latter are restricted to the language of (combined-semantics) CQ queries.

Databases

Monotonic Properties of Completed Aggregates in Recursive Queries

215 - Carlo Zaniolo , Ariyam Das , Jiaqi Gu 2019

The use of aggregates in recursion enables efficient and scalable support for a wide range of BigData algorithms, including those used in graph applications, KDD applications, and ML applications, which have proven difficult to be expressed and supported efficiently in BigData systems supporting Datalog or SQL. The problem with these languages and systems is that, to avoid the semantic and computational issues created by non-monotonic constructs in recursion, they only allow programs that are stratified with respect to negation and aggregates. Now, while this crippling restriction is well-justified for negation, it is frequently unjustified for aggregates, since (i) aggregates are often monotonic in the standard lattice of set-containment, (ii) the PreM property guarantees that programs with extrema in recursion are equivalent to stratified programs where extrema are used as post-constraints, and (iii) any program computing any aggregates on sets of facts of predictable cardinality tantamounts to stratified programs where the precomputation of the cardinality of the set is followed by a stratum where recursive rules only use monotonic constructs. With (i) and (ii) covered in previous papers, this paper focuses on (iii) using examples of great practical interest. For such examples, we provide a formal semantics that is conducive to efficient and scalable implementations via well-known techniques such as semi-naive fixpoint currently supported by most Datalog and SQL3 systems.

Databases