ﻻ يوجد ملخص باللغة العربية
In this work, we track the lineage of tuples throughout their database lifetime. That is, we consider a scenario in which tuples (records) that are produced by a query may affect other tuple insertions into the DB, as part of a normal workflow. As time goes on, exact provenance explanations for such tuples become deeply nested, increasingly consuming space, and resulting in decreased clarity and readability. We present a novel approach for approximating lineage tracking, using a Machine Learning (ML) and Natural Language Processing (NLP) technique; namely, word embedding. The basic idea is summarizing (and approximating) the lineage of each tuple via a small set of constant-size vectors (the number of vectors per-tuple is a hyperparameter). Therefore, our solution does not suffer from space complexity blow-up over time, and it naturally ranks explanations to the existence of a tuple. We devise an alternative and improved lineage tracking mechanism, that of keeping track of and querying lineage at the column level; thereby, we manage to better distinguish between the provenance features and the textual characteristics of a tuple. We integrate our lineage computations into the PostgreSQL system via an extension (ProvSQL) and experimentally exhibit useful results in terms of accuracy against exact, semiring-based, justifications. In the experiments, we focus on tuples with multiple generations of tuples in their lifelong lineage and analyze them in terms of direct and distant lineage. The experiments suggest a high usefulness potential for the proposed approximate lineage methods and the further suggested enhancements. This especially holds for the column-based vectors method which exhibits high precision and high per-level recall.
We study here the impact of priorities on conflict resolution in inconsistent relational databases. We extend the framework of repairs and consistent query answers. We propose a set of postulates that an extended framework should satisfy and consider
The broadening adoption of machine learning in the enterprise is increasing the pressure for strict governance and cost-effective performance, in particular for the common and consequential steps of model storage and inference. The RDBMS provides a n
One of the most important aspects of security organization is to establish a framework to identify security significant points where policies and procedures are declared. The (information) security infrastructure comprises entities, processes, and te
Variability inherently exists in databases in various contexts which creates database variants. For example, variants of a database could have different schemas/content (database evolution problem), variants of a database could root from different so
A new type of logs, the command log, is being employed to replace the traditional data log (e.g., ARIES log) in the in-memory databases. Instead of recording how the tuples are updated, a command log only tracks the transactions being executed, there