ترغب بنشر مسار تعليمي؟ اضغط هنا

Database Queries that Explain their Work

173   0   0.0 ( 0 )
 نشر من قبل James Cheney
 تاريخ النشر 2014
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Provenance for database queries or scientific workflows is often motivated as providing explanation, increasing understanding of the underlying data sources and processes used to compute the query, and reproducibility, the capability to recompute the results on different inputs, possibly specialized to a part of the output. Many provenance systems claim to provide such capabilities; however, most lack formal definitions or guarantees of these properties, while others provide formal guarantees only for relatively limited classes of changes. Building on recent work on provenance traces and slicing for functional programming languages, we introduce a detailed tracing model of provenance for multiset-valued Nested Relational Calculus, define trace slicing algorithms that extract subtraces needed to explain or recompute specific parts of the output, and define query slicing and differencing techniques that support explanation. We state and prove correctness properties for these techniques and present a proof-of-concept implementation in Haskell.



قيم البحث

اقرأ أيضاً

Program slicing provides explanations that illustrate how program outputs were produced from inputs. We build on an approach introduced in prior work by Perera et al., where dynamic slicing was defined for pure higher-order functional programs as a G alois connection between lattices of partial inputs and partial outputs. We extend this approach to imperative functional programs that combine higher-order programming with references and exceptions. We present proofs of correctness and optimality of our approach and a proof-of-concept implementation and experimental evaluation.
We consider ontology-mediated queries (OMQs) based on expressive description logics of the ALC family and (unions) of conjunctive queries, studying the rewritability into OMQs based on instance queries (IQs). Our results include exact characterizatio ns of when such a rewriting is possible and tight complexity bounds for deciding rewritability. We also give a tight complexity bound for the related problem of deciding whether a given MMSNP sentence is equivalent to a CSP.
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks. A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances. Following a probabilistic view of the attention via the Gaussian mixture model, we find empirical evidence that the Transformer attention tends to explain away certain input neurons. To compensate for this, we propose a doubly-normalized attention scheme that is simple to implement and provides theoretical guarantees for avoiding the explaining away effect without introducing significant computational or memory cost. Empirically, we show that the new attention schemes result in improved performance on several well-known benchmarks.
Language-integrated query is a popular and powerful programming construct allowing database queries and ordinary program code to interoperate seamlessly and safely. Language-integrated query techniques rely on classical results about the nested relat ional calculus stating that its queries can be algorithmically translated to SQL, as long as their result type is a flat relation. Cooper and others advocated higher-order nested relational calculi as a basis for language-integrated queries in functional languages such as Links and F#. However, the translation of higher-order relational queries to SQL relies on a rewrite system for which no strong normalization proof has been published: a previous proof attempt does not deal correctly with rewrite rules that duplicate subterms. This paper fills the gap in the literature, explaining the difficulty with the previous attempt, and showing how to extend the $toptop$-lifting approach of Lindley and Stark to accommodate duplicating rewrites. We also show how to extend the proof to a recently-introduced calculus for heterogeneous queries mixing set and multiset semantics.
Modern database systems are growing increasingly distributed and struggle to reduce query completion time with a large volume of data. In this paper, we leverage programmable switches in the network to partially offload query computation to the switc h. While switches provide high performance, they have resource and programming constraints that make implementing diverse queries difficult. To fit in these constraints, we introduce the concept of data emph{pruning} -- filtering out entries that are guaranteed not to affect output. The database system then runs the same query but on the pruned data, which significantly reduces processing time. We propose pruning algorithms for a variety of queries. We implement our system, Cheetah, on a Barefoot Tofino switch and Spark. Our evaluation on multiple workloads shows $40 - 200%$ improvement in the query completion time compared to Spark.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا