Sample Debiasing in the Themis Open World Database System (Extended Version)

161 0 0.0 ( 0 )

Download Cite

Added by Laurel Orr

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Laurel Orr - Magda Balazinska -

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Open world database management systems assume tuples not in the database still exist and are becoming an increasingly important area of research. We present Themis, the first open world database that automatically rebalances arbitrarily biased samples to approximately answer queries as if they were issued over the entire population. We leverage apriori population aggregate information to develop and combine two different approaches for automatic debiasing: sample reweighting and Bayesian network probabilistic modeling. We build a prototype of Themis and demonstrate that Themis achieves higher query accuracy than the default AQP approach, an alternative sample reweighting technique, and a variety of Bayesian network models while maintaining interactive query response times. We also show that ame is robust to differences in the support between the sample and population, a key use case when using social media samples.

rate research

The Deductive Database System LDL++

88 - Faiz Arni , KayLiang Ong , Shalom Tsur 2002

This paper describes the LDL++ system and the research advances that have enabled its design and development. We begin by discussing the new nonmonotonic and nondeterministic constructs that extend the functionality of the LDL++ language, while preserving its model-theoretic and fixpoint semantics. Then, we describe the execution model and the open architecture designed to support these new constructs and to facilitate the integration with existing DBMSs and applications. Finally, we describe the lessons learned by using LDL++ on various tested applications, such as middleware and datamining.

Databases Artificial Intelligence

Database Reformulation with Integrity Constraints (extended abstract)

104 - Rada Chirkova , Michael R. Genesereth 2005

In this paper we study the problem of reducing the evaluation costs of queries on finite databases in presence of integrity constraints, by designing and materializing views. Given a database schema, a set of queries defined on the schema, a set of integrity constraints, and a storage limit, to find a solution to this problem means to find a set of views that satisfies the storage limit, provides equivalent rewritings of the queries under the constraints (this requirement is weaker than equivalence in the absence of constraints), and reduces the total costs of evaluating the queries. This problem, database reformulation, is important for many applications, including data warehousing and query optimization. We give complexity results and algorithms for database reformulation in presence of constraints, for conjunctive queries, views, and rewritings and for several types of constraints, including functional and inclusion dependencies. To obtain better complexity results, we introduce an unchase technique, which reduces the problem of query equivalence under constraints to equivalence in the absence of constraints without increasing query size.

Databases

ProGS: Property Graph Shapes Language (Extended Version)

88 - Philipp Seifer , Ralf Lammel , Steffen Staab 2021

Property graphs constitute data models for representing knowledge graphs. They allow for the convenient representation of facts, including facts about facts, represented by triples in subject or object position of other triples. Knowledge graphs such as Wikidata are created by a diversity of contributors and a range of sources leaving them prone to two types of errors. The first type of error, falsity of facts, is addressed by property graphs through the representation of provenance and validity, making triples occur as first-order objects in subject position of metadata triples. The second type of error, violation of domain constraints, has not been addressed with regard to property graphs so far. In RDF representations, this error can be addressed by shape languages such as SHACL or ShEx, which allow for checking whether graphs are valid with respect to a set of domain constraints. Borrowing ideas from the syntax and semantics definitions of SHACL, we design a shape language for property graphs, ProGS, which allows for formulating shape constraints on property graphs including their specific constructs, such as edges with identities and key-value annotations to both nodes and edges. We define a formal semantics of ProGS, investigate the resulting complexity of validating property graphs against sets of ProGS shapes, compare with corresponding results for SHACL, and implement a prototypical validator that utilizes answer set programming.

Databases Artificial Intelligence Programming Languages

Spitz: A Verifiable Database System

221 - Meihui Zhang , Zhongle Xie , Cong Yue 2020

Databases in the past have helped businesses maintain and extract insights from their data. Today, it is common for a business to involve multiple independent, distrustful parties. This trend towards decentralization introduces a new and important requirement to databases: the integrity of the data, the history, and the execution must be protected. In other words, there is a need for a new class of database systems whose integrity can be verified (or verifiable databases). In this paper, we identify the requirements and the design challenges of verifiable databases.We observe that the main challenges come from the need to balance data immutability, tamper evidence, and performance. We first consider approaches that extend existing OLTP and OLAP systems with support for verification. We next examine a clean-slate approach, by describing a new system, Spitz, specifically designed for efficiently supporting immutable and tamper-evident transaction management. We conduct a preliminary performance study of both approaches against a baseline system, and provide insights on their performance.

Databases Cryptography and Security Distributed Parallel and Cluster Computing

Snapshot Semantics for Temporal Multiset Relations (Extended Version)

200 - Anton Dignos , Boris Glavic , Xing Niu 2019

Snapshot semantics is widely used for evaluating queries over temporal data: temporal relations are seen as sequences of snapshot relations, and queries are evaluated at each snapshot. In this work, we demonstrate that current approaches for snapshot semantics over interval-timestamped multiset relations are subject to two bugs regarding snapshot aggregation and bag difference. We introduce a novel temporal data model based on K-relations that overcomes these bugs and prove it to correctly encode snapshot semantics. Furthermore, we present an efficient implementation of our model as a database middleware and demonstrate experimentally that our approach is competitive with native implementations and significantly outperforms such implementations on queries that involve aggregation.

Databases