Generalized Deletion Propagation on Counting Conjunctive Query Answers


Abstract in English

We investigate the computational complexity of minimizing the source side-effect in order to remove a given number of tuples from the output of a conjunctive query. In particular, given a multi-relational database $D$, a conjunctive query $Q$, and a positive integer $k$ as input, the goal is to find a minimum subset of input tuples to remove from D that would eliminate at least $k$ output tuples from $Q(D)$. This problem generalizes the well-studied deletion propagation problem in databases. In addition, it encapsulates the notion of intervention for aggregate queries used in data analysis with applications to explaining interesting observations on the output. We show a dichotomy in the complexity of this problem for the class of full conjunctive queries without self-joins by giving a characterization on the structure of $Q$ that makes the problem either polynomial-time solvable or NP-hard. Our proof of this dichotomy result already gives an exact algorithm in the easy cases; we complement this by giving an approximation algorithm for the hard cases of the problem.

Download