أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل James Cheney

Database Queries that Explain their Work

133 - James Cheney , Amal Ahmed , Umut A. Acar 2014

Provenance for database queries or scientific workflows is often motivated as providing explanation, increasing understanding of the underlying data sources and processes used to compute the query, and reproducibility, the capability to recompute the results on different inputs, possibly specialized to a part of the output. Many provenance systems claim to provide such capabilities; however, most lack formal definitions or guarantees of these properties, while others provide formal guarantees only for relatively limited classes of changes. Building on recent work on provenance traces and slicing for functional programming languages, we introduce a detailed tracing model of provenance for multiset-valued Nested Relational Calculus, define trace slicing algorithms that extract subtraces needed to explain or recompute specific parts of the output, and define query slicing and differencing techniques that support explanation. We state and prove correctness properties for these techniques and present a proof-of-concept implementation in Haskell.

لغات البرمجة قواعد البيانات

An Analytical Survey of Provenance Sanitization

51 - James Cheney , Roly Perera 2014

Security is likely becoming a critical factor in the future adoption of provenance technology, because of the risk of inadvertent disclosure of sensitive information. In this survey paper we review the state of the art in secure provenance, consideri ng mechanisms for controlling access, and the extent to which these mechanisms preserve provenance integrity. We examine seven systems or approaches, comparing features and identifying areas for future work.

قواعد البيانات التشفير والأمن

A Core Calculus for Provenance

66 - Umut A. Acar , Amal Ahmed , James Cheney 2013

Provenance is an increasing concern due to the ongoing revolution in sharing and processing scientific data on the Web and in other computer systems. It is proposed that many computer systems will need to become provenance-aware in order to provide s atisfactory accountability, reproducibility, and trust for scientific or other high-value data. To date, there is not a consensus concerning appropriate formal models or security properties for provenance. In previous work, we introduced a formal framework for provenance security and proposed formal definitions of properties called disclosure and obfuscation. In this article, we study refined notions of positive and negative disclosure and obfuscation in a concrete setting, that of a general-purpose programing language. Previous models of provenance have focused on special-purpose languages such as workflows and database queries. We consider a higher-order, functional language with sums, products, and recursive types and functions, and equip it with a tracing semantics in which traces themselves can be replayed as computations. We present an annotation-propagation framework that supports many provenance views over traces, including standard forms of provenance studied previously. We investigate some relationships among provenance views and develop some partial solutions to the disclosure and obfuscation problems, including correct algorithms for disclosure and positive obfuscation based on trace slicing.

لغات البرمجة

Mechanizing the Metatheory of LF

67 - Christian Urban , James Cheney , Stefan Berghofer 2010

LF is a dependent type theory in which many other formal systems can be conveniently embedded. However, correct use of LF relies on nontrivial metatheoretic developments such as proofs of correctness of decision procedures for LFs judgments. Although detailed informal proofs of these properties have been published, they have not been formally verified in a theorem prover. We have formalized these properties within Isabelle/HOL using the Nominal Datatype Package, closely following a recent article by Harper and Pfenning. In the process, we identified and resolved a gap in one of the proofs and a small number of minor lacunae in others. We also formally derive a version of the type checking algorithm from which Isabelle/HOL can generate executable code. Besides its intrinsic interest, our formalization provides a foundation for studying the adequacy of LF encodings, the correctness of Twelf-style metatheoretic reasoning, and the metatheory of extensions to LF.

المنطق في علوم الحاسوب

Provenance as Dependency Analysis

87 - James Cheney , Amal Ahmed , 2009

Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundatio ns of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques.

قواعد البيانات لغات البرمجة

Provenance Traces

69 - James Cheney , Umut Acar , Amal Ahmed 2008

Provenance is information about the origin, derivation, ownership, or history of an object. It has recently been studied extensively in scientific databases and other settings due to its importance in helping scientists judge data validity, quality a nd integrity. However, most models of provenance have been stated as ad hoc definitions motivated by informal concepts such as comes from, influences, produces, or depends on. These models lack clear formalizations describing in what sense the definitions capture these intuitive concepts. This makes it difficult to compare approaches, evaluate their effectiveness, or argue about their validity. We introduce provenance traces, a general form of provenance for the nested relational calculus (NRC), a core database query language. Provenance traces can be thought of as concrete data structures representing the operational semantics derivation of a computation; they are related to the traces that have been used in self-adjusting computation, but differ in important respects. We define a tracing operational semantics for NRC queries that produces both an ordinary result and a trace of the execution. We show that three pre-existing forms of provenance for the NRC can be extracted from provenance traces. Moreover, traces satisfy two semantic guarantees: consistency, meaning that the traces describe what actually happened during execution, and fidelity, meaning that the traces explain how the expression would behave if the input were changed. These guarantees are much stronger than those contemplated for previous approaches to provenance; thus, provenance traces provide a general semantic foundation for comparing and unifying models of provenance in databases.

لغات البرمجة قواعد البيانات

Flux: FunctionaL Updates for XML (extended report)

83 - James Cheney 2008

XML database query languages have been studied extensively, but XML database updates have received relatively little attention, and pose many challenges to language design. We are developing an XML update language called Flux, which stands for Functi onaL Updates for XML, drawing upon ideas from functional programming languages. In prior work, we have introduced a core language for Flux with a clear operational semantics and a sound, decidable static type system based on regular expression types. Our initial proposal had several limitations. First, it lacked support for recursive types or update procedures. Second, although a high-level source language can easily be translated to the core language, it is difficult to propagate meaningful type errors from the core language back to the source. Third, certain updates are well-formed yet contain path errors, or ``dead subexpressions which never do any useful work. It would be useful to detect path errors, since they often represent errors or optimization opportunities. In this paper, we address all three limitations. Specifically, we present an improved, sound type system that handles recursion. We also formalize a source update language and give a translation to the core language that preserves and reflects typability. We also develop a path-error analysis (a form of dead-code analysis) for updates.

لغات البرمجة قواعد البيانات

Regular Expression Subtyping for XML Query and Update Languages

90 - James Cheney 2008

XML database query languages such as XQuery employ regular expression types with structural subtyping. Subtyping systems typically have two presentations, which should be equivalent: a declarative version in which the subsumption rule may be used any where, and an algorithmic version in which the use of subsumption is limited in order to make typechecking syntax-directed and decidable. However, the XQuery standard type system circumvents this issue by using imprecise typing rules for iteration constructs and defining only algorithmic typechecking, and another extant proposal provides more precise types for iteration constructs but ignores subtyping. In this paper, we consider a core XQuery-like language with a subsumption rule and prove the completeness of algorithmic typechecking; this is straightforward for XQuery proper but requires some care in the presence of more precise iteration typing disciplines. We extend this result to an XML update language we have introduced in earlier work.

لغات البرمجة قواعد البيانات

Repairing Inconsistent XML Write-Access Control Policies

106 - Loreto Bravo , James Cheney , Irini Fundulaki 2007

XML access control policies involving updates may contain security flaws, here called inconsistencies, in which a forbidden operation may be simulated by performing a sequence of allowed operations. This paper investigates the problem of deciding whe ther a policy is consistent, and if not, how its inconsistencies can be repaired. We consider policies expressed in terms of annotated DTDs defining which operations are allowed or denied for the XML trees that are instances of the DTD. We show that consistency is decidable in PTIME for such policies and that consistent partial policies can be extended to unique least-privilege consistent total policies. We also consider repair problems based on deleting privileges to restore consistency, show that finding minimal repairs is NP-complete, and give heuristics for finding repairs.

قواعد البيانات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد