No Arabic abstract
In the current paper, we propose to fuse together stored data (tables) and their functional dependencies (FDs) inside a DBMS. We aim to make FDs first-class citizens: objects which can be queried and used to query data. Our idea is to allow analysts to explore both data and functional dependencies using the database interface. For example, an analyst may be interested in such tasks as: find all rows which prevent a given functional dependency from holding, for a given table, find all functional dependencies that involve a given attribute, project all attributes that functionally determine a specified attribute. For this purpose, we propose: (1) an SQL-based query language for querying a collection of functional dependencies (2) an extension of the SQL SELECT clause for supporting FD-based predicates, including approximate ones (3) a special data structure intended for containing mined FDs and acting as a mediator between user queries and underlying data. We describe the proposed extensions, demonstrate their use-cases, and finally, discuss implementation details and their impact on query processing.
Functional Dependencies (FDs) define attribute relationships based on syntactic equality, and, when usedin data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We explore dependency-based data cleaning with Ontology Functional Dependencies(OFDs), which express semantic attribute relationships such as synonyms and is-a hierarchies defined by an ontology. We study the theoretical foundations for OFDs, including sound and complete axioms and a linear-time inference procedure. We then propose an algorithm for discovering OFDs (exact ones and ones that hold with some exceptions) from data that uses the axioms to prune the search space. Towards enabling OFDs as data quality rules in practice, we study the problem of finding minimal repairs to a relation and ontology with respect to a set of OFDs. We demonstrate the effectiveness of our techniques on real datasets, and show that OFDs can significantly reduce the number of false positive errors in data cleaning techniques that rely on traditional FDs.
We propose a class of functional dependencies for temporal graphs, called TGFDs. TGFDs capture both attribute-value dependencies and topological structures of entities over a valid period of time in a temporal graph. It subsumes graph functional dependencies (gfds) and conditional functional dependencies (CFDs) as a special case. We study the foundations of TGFDs including satisfiability, implication and validation. We show that the satisfiability and validation problems are coNP-complete and the implication problem is NP-complete. We also present an axiomatization of TGFDs and provide the proof of the soundness and completeness of the axiomatization.
Individuals and organizations tend to migrate their data to clouds, especially in a DataBase as a Service (DBaaS) pattern. The major obstacle is the conflict between secrecy and utilization of the relational database to be outsourced. We address this obstacle with a Transparent DataBase (T-DB) system strictly following the unmodified DBaaS framework. A database owner outsources an encrypted database to a cloud platform, needing only to store the secret keys for encryption and an empty table header for the database; the database users can make almost all types of queries on the encrypted database as usual; and the cloud can process ciphertext queries as if the database were not encrypted. Experimentations in realistic cloud environments demonstrate that T-DB has perfect query answer precision and outstanding performance.
As most users do not precisely know the structure and/or the content of databases, their queries do not exactly reflect their information needs. The database management systems (DBMS) may interact with users and use their feedback on the returned results to learn the information needs behind their queries. Current query interfaces assume that users do not learn and modify the way way they express their information needs in form of queries during their interaction with the DBMS. Using a real-world interaction workload, we show that users learn and modify how to express their information needs during their interactions with the DBMS and their learning is accurately modeled by a well-known reinforcement learning mechanism. As current data interaction systems assume that users do not modify their strategies, they cannot discover the information needs behind users queries effectively. We model the interaction between users and DBMS as a game with identical interest between two rational agents whose goal is to establish a common language for representing information needs in form of queries. We propose a reinforcement learning method that learns and answers the information needs behind queries and adapts to the changes in users strategies and prove that it improves the effectiveness of answering queries stochastically speaking. We propose two efficient implementation of this method over large relational databases. Our extensive empirical studies over real-world query workloads indicate that our algorithms are efficient and effective.
Graph query languages feature mainly two kinds of queries when applied to a graph database: those inspired by relational databases which return tables such as SELECT queries and those which return graphs such as CONSTRUCT queries in SPARQL. The latter are object of study in the present paper. For this purpose, a core graph query language GrAL is defined with focus on CONSTRUCT queries. Queries in GrAL form the final step of a recursive process involving so-called GrAL patterns. By evaluating a query over a graph one gets a graph, while by evaluating a pattern over a graph one gets a set of matches which involves both a graph and a table. CONSTRUCT queries are based on CONSTRUCT patterns, and sub-CONSTRUCT patterns come for free from the recursive definition of patterns. The semantics of GrAL is based on RDF graphs with a slight modification which consists in accepting isolated nodes. Such an extension of RDF graphs eases the definition of the evaluation semantics, which is mainly captured by a unique operation called Merge. Besides, we define aggregations as part of GrAL expressions, which leads to an original local processing of aggregations.