أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Sudeepa Roy

Model Counting of Query Expressions: Limitations of Propositional Methods

149 - Paul Beame , Jerry Li , Sudeepa Roy 2013

Query evaluation in tuple-independent probabilistic databases is the problem of computing the probability of an answer to a query given independent probabilities of the individual tuples in a database instance. There are two main approaches to this p roblem: (1) in `grounded inference one first obtains the lineage for the query and database instance as a Boolean formula, then performs weighted model counting on the lineage (i.e., computes the probability of the lineage given probabilities of its independent Boolean variables); (2) in methods known as `lifted inference or `extensional query evaluation, one exploits the high-level structure of the query as a first-order formula. Although it is widely believed that lifted inference is strictly more powerful than grounded inference on the lineage alone, no formal separation has previously been shown for query evaluation. In this paper we show such a formal separation for the first time. We exhibit a class of queries for which model counting can be done in polynomial time using extensional query evaluation, whereas the algorithms used in state-of-the-art exact model counters on their lineages provably require exponential time. Our lower bounds on the running times of these exact model counters follow from new exponential size lower bounds on the kinds of d-DNNF representations of the lineages that these model counters (either explicitly or implicitly) produce. Though some of these queries have been studied before, no non-trivial lower bounds on the sizes of these representations for these queries were previously known.

قواعد البيانات التعقيد الحسابي

A Propagation Model for Provenance Views of Public/Private Workflows

90 - Susan B. Davidson , Tova Milo , Sudeepa Roy 2012

We study the problem of concealing functionality of a proprietary or private module when provenance information is shown over repeated executions of a workflow which contains both `public and `private modules. Our approach is to use `provenance views to hide carefully chosen subsets of data over all executions of the workflow to ensure G-privacy: for each private module and each input x, the modules output f(x) is indistinguishable from G -1 other possible values given the visible data in the workflow executions. We show that G-privacy cannot be achieved simply by combining solutions for individual private modules; data hiding must also be `propagated through public modules. We then examine how much additional data must be hidden and when it is safe to stop propagating data hiding. The answer depends strongly on the workflow topology as well as the behavior of public modules on the visible data. In particular, for a class of workflows (which include the common tree and chain workflows), taking private solutions for each private module, augmented with a `public closure that is `upstream-downstream safe, ensures G-privacy. We define these notions formally and show that the restrictions are necessary. We also study the related optimization problems of minimizing the amount of hidden data.

قواعد البيانات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد