Approximate Summaries for Why and Why-not Provenance (Extended Version)

124 0 0.0 ( 0 )

Download Cite

Added by Seokki Lee

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Seokki Lee - Bertram Ludaescher - Boris Glavic

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Why and why-not provenance have been studied extensively in recent years. However, why-not provenance, and to a lesser degree why provenance, can be very large resulting in severe scalability and usability challenges. In this paper, we introduce a novel approximate summarization technique for provenance which overcomes these challenges. Our approach uses patterns to encode (why-not) provenance concisely. We develop techniques for efficiently computing provenance summaries balancing informativeness, conciseness, and completeness. To achieve scalability, we integrate sampling techniques into provenance capture and summarization. Our approach is the first to scale to large datasets and to generate comprehensive and meaningful summaries.

rate research

PUG: A Framework and Practical Implementation for Why & Why-Not Provenance (extended version)

138 - Seokki Lee , Bertram Ludaescher , Boris Glavic 2018

Explaining why an answer is (or is not) returned by a query is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. In this work, we present the first practical approach for answering such questions for queries with negation (first- order queries). Specifically, we introduce a graph-based provenance model that, while syntactic in nature, supports reverse reasoning and is proven to encode a wide range of provenance models from the literature. The implementation of this model in our PUG (Provenance Unification through Graphs) system takes a provenance question and Datalog query as an input and generates a Datalog program that computes an explanation, i.e., the part of the provenance that is relevant to answer the question. Furthermore, we demonstrate how a desirable factorization of provenance can be achieved by rewriting an input query. We experimentally evaluate our approach demonstrating its efficiency.

Databases

Enriching Ontology-based Data Access with Provenance (Extended Version)

118 - Diego Calvanese , Davide Lanti , Ana Ozaki 2019

Ontology-based data access (OBDA) is a popular paradigm for querying heterogeneous data sources by connecting them through mappings to an ontology. In OBDA, it is often difficult to reconstruct why a tuple occurs in the answer of a query. We address this challenge by enriching OBDA with provenance semirings, taking inspiration from database theory. In particular, we investigate the problems of (i) deciding whether a provenance annotated OBDA instance entails a provenance annotated conjunctive query, and (ii) computing a polynomial representing the provenance of a query entailed by a provenance annotated OBDA instance. Differently from pure databases, in our case these polynomials may be infinite. To regain finiteness, we consider idempotent semirings, and study the complexity in the case of DL-Lite ontologies. We implement Task (ii) in a state-of-the-art OBDA system and show the practical feasibility of the approach through an extensive evaluation against two popular benchmarks.

Databases Artificial Intelligence

How and Why is An Answer (Still) Correct? Maintaining Provenance in Dynamic Knowledge Graphs

96 - Garima Gaur , Arnab Bhattacharya , Srikanta Bedathur 2020

Knowledge graphs (KGs) have increasingly become the backbone of many critical knowledge-centric applications. Most large-scale KGs used in practice are automatically constructed based on an ensemble of extraction techniques applied over diverse data sources. Therefore, it is important to establish the provenance of results for a query to determine how these were computed. Provenance is shown to be useful for assigning confidence scores to the results, for debugging the KG generation itself, and for providing answer explanations. In many such applications, certain queries are registered as standing queries since their answers are needed often. However, KGs keep continuously changing due to reasons such as changes in the source data, improvements to the extraction techniques, refinement/enrichment of information, and so on. This brings us to the issue of efficiently maintaining the provenance polynomials of complex graph pattern queries for dynamic and large KGs instead of having to recompute them from scratch each time the KG is updated. Addressing these issues, we present HUKA which uses provenance polynomials for tracking the derivation of query results over knowledge graphs by encoding the edges involved in generating the answer. More importantly, HUKA also maintains these provenance polynomials in the face of updates---insertions as well as deletions of facts---to the underlying KG. Experimental results over large real-world KGs such as YAGO and DBpedia with various benchmark SPARQL query workloads reveals that HUKA can be almost 50 times faster than existing systems for provenance computation on dynamic KGs.

Databases

Why Not Categorical Equivalence?

73 - James Owen Weatherall 2018

In recent years philosophers of science have explored categorical equivalence as a promising criterion for when two (physical) theories are equivalent. On the one hand, philosophers have presented several examples of theories whose relationships seem to be clarified using these categorical methods. On the other hand, philosophers and logicians have studied the relationships, particularly in the first order case, between categorical equivalence and other notions of equivalence of theories, including definitional equivalence and generalized definitional (aka Morita) equivalence. In this article, I will express some skepticism about this approach, both on technical grounds and conceptual ones. I will argue that category structure (alone) likely does not capture the structure of a theory, and discuss some recent work in light of this claim.

History and Philosophy of Physics

Why Halley did not discover proper motion and why Cassini did

75 - Frank Verbunt , Marc van der Sluys 2019

In 1717 Halley compared contemporaneous measurements of the latitudes of four stars with earlier measurements by ancient Greek astronomers and by Brahe, and from the differences concluded that these four stars showed proper motion. An analysis with modern methods shows that the data used by Halley do not contain significant evidence for proper motion. What Halley found are the measurement errors of Ptolemaios and Brahe. Halley further argued that the occultation of Aldebaran by the Moon on 11 March 509 in Athens confirmed the change in latitude of Aldebaran. In fact, however, the relevant observation was almost certainly made in Alexandria where Aldebaran was not occulted. By carefully considering measurement errors Jacques Cassini showed that Halleys results from comparison with earlier astronomers were spurious, a conclusion partially confirmed by various later authors. Cassinis careful study of the measurements of the latitude of Arcturus provides the first significant evidence for proper motion.

History and Philosophy of Physics Solar and Stellar Astrophysics