Provenance Views for Module Privacy

505 0 0.0 ( 0 )

Download Cite

Added by Sudeepa Roy

Publication date 2010

fields Informatics Engineering

and research's language is English

Authors Susan B. Davidson - Sanjeev Khanna - Tova Milo

Databases Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Scientific workflow systems increasingly store provenance information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions. However, authors/owners of workflows may wish to keep some of this information confidential. In particular, a module may be proprietary, and users should not be able to infer its behavior by seeing mappings between all data inputs and outputs. The problem we address in this paper is the following: Given a workflow, abstractly modeled by a relation R, a privacy requirement Gamma and costs associated with data. The owner of the workflow decides which data (attributes) to hide, and provides the user with a view R which is the projection of R over attributes which have not been hidden. The goal is to minimize the cost of hidden data while guaranteeing that individual modules are Gamma -private. We call this the secureview problem. We formally define the problem, study its complexity, and offer algorithmic solutions.

rate research

Aggregation by Provenance Types: A Technique for Summarising Provenance Graphs

523 - Luc Moreau 2015

As users become confronted with a deluge of provenance data, dedicated techniques are required to make sense of this kind of information. We present Aggregation by Provenance Types, a provenance graph analysis that is capable of generating provenance graph summaries. It proceeds by converting provenance paths up to some length k to attributes, referred to as provenance types, and by grouping nodes that have the same provenance types. The summary also includes numeric values representing the frequency of nodes and edges in the original graph. A quantitative evaluation and a complexity analysis show that this technique is tractable; with small values of k, it can produce useful summaries and can help detect outliers. We illustrate how the generated summaries can further be used for conformance checking and visualization.

Databases Data Structures and Algorithms

A Propagation Model for Provenance Views of Public/Private Workflows

341 - Susan B. Davidson , Tova Milo , Sudeepa Roy 2012

We study the problem of concealing functionality of a proprietary or private module when provenance information is shown over repeated executions of a workflow which contains both `public and `private modules. Our approach is to use `provenance views to hide carefully chosen subsets of data over all executions of the workflow to ensure G-privacy: for each private module and each input x, the modules output f(x) is indistinguishable from G -1 other possible values given the visible data in the workflow executions. We show that G-privacy cannot be achieved simply by combining solutions for individual private modules; data hiding must also be `propagated through public modules. We then examine how much additional data must be hidden and when it is safe to stop propagating data hiding. The answer depends strongly on the workflow topology as well as the behavior of public modules on the visible data. In particular, for a class of workflows (which include the common tree and chain workflows), taking private solutions for each private module, augmented with a `public closure that is `upstream-downstream safe, ensures G-privacy. We define these notions formally and show that the restrictions are necessary. We also study the related optimization problems of minimizing the amount of hidden data.

Databases

Provenance as Dependency Analysis

363 - James Cheney , Amal Ahmed , 2009

Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques.

Databases Programming Languages

Research Traceability using Provenance Services for Biomedical Analysis

428 - Ashiq Anjum , Peter Bloodsworth , Andrew Branson 2014

We outline the approach being developed in the neuGRID project to use provenance management techniques for the purposes of capturing and preserving the provenance data that emerges in the specification and execution of workflows in biomedical analyses. In the neuGRID project a provenance service has been designed and implemented that is intended to capture, store, retrieve and reconstruct the workflow information needed to facilitate users in conducting user analyses. We describe the architecture of the neuGRID provenance service and discuss how the CRISTAL system from CERN is being adapted to address the requirements of the project and then consider how a generalised approach for provenance management could emerge for more generic application to the (Health)Grid community.

Databases

Research Traceability using Provenance Services for Biomedical Analysis