Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration

110 0 0.0 ( 0 )

Download Cite

Added by Nofar Carmeli

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Nofar Carmeli - Shai Zeevi - Christoph Berkholz

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports polylogarithmic-delay enumeration. In this paper, we seek a structure that supports the more demanding task of a random permutation: polylogarithmic-delay enumeration in truly random order. Enumeration of this kind is required if downstream applications assume that the intermediate results are representative of the whole result set in a statistically valuable manner. An even more demanding task is that of a random access: polylogarithmic-time retrieval of an answer whose position is given. We establish that the free-connex acyclic CQs are tractable in all three senses: enumeration, random-order enumeration, and random access; and in the absence of self-joins, it follows from past results that every other CQ is intractable by each of the three (under some fine-grained complexity assumptions). However, the three yardsticks are separated in the case of a union of CQs (UCQ): while a union of free-connex acyclic CQs has a tractable enumeration, it may (provably) admit no random access. For such UCQs we devise a random-order enumeration whose delay is logarithmic in expectation. We also identify a subclass of UCQs for which we can provide random access with polylogarithmic access time. Finally, we present an implementation and an empirical study that show a considerable practical superiority of our random-order enumeration approach over state-of-the-art alternatives.

rate research

Answering Conjunctive Queries under Updates

77 - Christoph Berkholz , Jens Keppeler , Nicole Schweikardt 2017

We consider the task of enumerating and counting answers to $k$-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. We exhibit a new notion of q-hierarchical conjunctive queries and show that these can be maintained efficiently in the following sense. During a linear time preprocessing phase, we can build a data structure that enables constant delay enumeration of the query results; and when the database is updated, we can update the data structure and restart the enumeration phase within constant time. For the special case of self-join free conjunctive queries we obtain a dichotomy: if a query is not q-hierarchical, then query enumeration with sublinear$^ast$ delay and sublinear update time (and arbitrary preprocessing time) is impossible. For answering Boolean conjunctive queries and for the more general problem of counting the number of solutions of k-ary queries we obtain complete dichotomies: if the querys homomorphic core is q-hierarchical, then size of the the query result can be computed in linear time and maintained with constant update time. Otherwise, the size of the query result cannot be maintained with sublinear update time. All our lower bounds rely on the OMv-conjecture, a conjecture on the hardness of online matrix-vector multiplication that has recently emerged in the field of fine-grained complexity to characterise the hardness of dynamic problems. The lower bound for the counting problem additionally relies on the orthogonal vectors conjecture, which in turn is implied by the strong exponential time hypothesis. $^ast)$ By sublinear we mean $O(n^{1-varepsilon})$ for some $varepsilon>0$, where $n$ is the size of the active domain of the current database.

Databases Data Structures and Algorithms Logic in Computer Science

A Dichotomy for the Generalized Model Counting Problem for Unions of Conjunctive Queries

154 - Batya Kenig , Dan Suciu 2020

We study the $generalized~model~counting~problem$, defined as follows: given a database, and a set of deterministic tuples, count the number of subsets of the database that include all deterministic tuples and satisfy the query. This problem is computationally equivalent to the evaluation of the query over a tuple-independent probabilistic database where all tuples have probabilities in ${0,frac{1}{2},1}$. Previous work has established a dichotomy for Unions of Conjunctive Queries (UCQ) when the probabilities are arbitrary rational numbers, showing that, for each query, its complexity is either in polynomial time or #P-hard. The query is called $safe$ in the first case, and $unsafe$ in the second case. Here, we strengthen the hardness proof, by proving that an unsafe UCQ query remains #P-hard even if the probabilities are restricted to ${0,frac{1}{2},1}$. This requires a complete redesign of the hardness proof, using new techniques. A related problem is the $model~counting~problem$, which asks for the probability of the query when the input probabilities are restricted to ${0,frac{1}{2}}$. While our result does not extend to model counting for all unsafe UCQs, we prove that model counting is #P-hard for a class of unsafe queries called Type-I forbidden queries.

Databases Computational Complexity

Constant delay enumeration with FPT-preprocessing for conjunctive queries of bounded submodular width

66 - Christoph Berkholz , Nicole Schweikardt 2020

Marx (STOC~2010, J.~ACM 2013) introduced the notion of submodular width of a conjunctive query (CQ) and showed that for any class $Phi$ of Boolean CQs of bounded submodular width, the model-checking problem for $Phi$ on the class of all finite structures is fixed-parameter tractable (FPT). Note that for non-Boolean queries, the size of the query result may be far too large to be computed entirely within FPT time. We investigate the free-connex variant of submodular width and generalise Marxs result to non-Boolean queries as follows: For every class $Phi$ of CQs of bounded free-connex submodular width, within FPT-preprocessing time we can build a data structure that allows to enumerate, without repetition and with constant delay, all tuples of the query result. Our proof builds upon Marxs splitting routine to decompose the query result into a union of results; but we have to tackle the additional technical difficulty to ensure that these can be enumerated efficiently.

Databases Logic in Computer Science

Parallel-Correctness and Transferability for Conjunctive Queries

468 - Tom J. Ameloot , Gaetano Geck , Bas Ketsman 2014

A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication-free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We study the complexity of parallel-correctness for conjunctive queries as well as transferability of parallel-correctness between queries. We also investigate the complexity of transferability for certain families of distribution policies, including, for instance, the Hypercube distribution.

Databases

Structural Indexing for Conjunctive Path Queries

335 - Yuya Sasaki , George Fletcher , Makoto Onizuka 2020

Structural indexing is an approach to accelerating query evaluation, whereby data objects are partitioned and indexed reflecting the precise expressive power of a given query language. Each partition block of the index holds exactly those objects that are indistinguishable with respect to queries expressible in the language. Structural indexes have proven successful for XML, RDF, and relational data management. In this paper we study structural indexing for conjunctive path queries (CPQ). CPQ forms the core of contemporary graph query languages such as SPARQL, Cypher, PGQL, and G-CORE. CPQ plays the same fundamental role with respect to contemporary graph query languages as the classic conjunctive queries play for SQL. We develop the first practical structural indexes for this important query language. In particular, we propose a structural index based on k-path-bisimulation, tightly coupled to the expressive power of CPQ, and develop algorithms for efficient query processing with our index. Furthermore, we study workload-aware structural indexes to reduce both the construction and space costs according to a given workload. We demonstrate through extensive experiments using real and synthetic graphs that our methods accelerate query processing by up to multiple orders of magnitude over the state-of-the-art methods, without increasing index size.

Databases