No Arabic abstract
One of the distinctive features of Information Retrieval systems comparing to Database Management systems, is that they offer better compression for posting lists, resulting in better I/O performance and thus faster query evaluation. In this paper, we introduce database representations of the index that reduce the size (and thus the disk I/Os) of the posting lists. This is not achieved by redesigning the DBMS, but by exploiting the non 1NF features that existing Object-Relational DBM systems (ORDBMS) already offer. Specifically, four different database representations are described and detailed experimental results for one million pages are reported. Three of these representations are one order of magnitude more space efficient and faster (in query evaluation) than the plain relational representation.
In this paper, we address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables. Embeddings help to capture semantics encoded in the database and can be used in a variety of settings like auto-completion of tables, fully-neural query processing of relational joins queries, seamlessly handling missing values, and more. Current work is restricted to working with just single table, or using pretrained embeddings over an external corpus making them unsuitable for use in real-world databases. In this work, we look into ways of using these attention-based model to learn embeddings for entities in the relational database. We are inspired by BERT style pretraining methods and are interested in observing how they can be extended for representation learning on structured databases. We evaluate our approach of the autocompletion of relational databases and achieve improvement over standard baselines.
As the amount of online text increases, the demand for text categorization to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic categorization of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. Text categorization using Association Rule and Naive Bayes Classifier is proposed here. Instead of using words word relation i.e association rules from these words is used to derive feature set from pre-classified text documents. Naive Bayes Classifier is then used on derived features for final categorization.
Conventional machine learning algorithms cannot be applied until a data matrix is available to process. When the data matrix needs to be obtained from a relational database via a feature extraction query, the computation cost can be prohibitive, as the data matrix may be (much) larger than the total input relation size. This paper introduces Rk-means, or relational k -means algorithm, for clustering relational data tuples without having to access the full data matrix. As such, we avoid having to run the expensive feature extraction query and storing its output. Our algorithm leverages the underlying structures in relational data. It involves construction of a small {it grid coreset} of the data matrix for subsequent cluster construction. This gives a constant approximation for the k -means objective, while having asymptotic runtime improvements over standard approaches of first running the database query and then clustering. Empirical results show orders-of-magnitude speedup, and Rk-means can run faster on the database than even just computing the data matrix.
The binary relation framework has been shown to be applicable to many real-life preference handling scenarios. Here we study preference contraction: the problem of discarding selected preferences. We argue that the property of minimality and the preservation of strict partial orders are crucial for contractions. Contractions can be further constrained by specifying which preferences should be protected. We consider two classes of preference relations: finite and finitely representable. We present algorithms for computing minimal and preference-protecting minimal contractions for finite as well as finitely representable preference relations. We study relationships between preference change in the binary relation framework and belief change in the belief revision theory. We also introduce some preference query optimization techniques which can be used in the presence of contraction. We evaluate the proposed algorithms experimentally and present the results.
Lenses are a popular approach to bidirectional transformations, a generalisation of the view update problem in databases, in which we wish to make changes to source tables to effect a desired change on a view. However, perhaps surprisingly, lenses have seldom actually been used to implement updatable views in databases. Bohannon, Pierce and Vaughan proposed an approach to updatable views called relational lenses, but to the best of our knowledge this proposal has not been implemented or evaluated to date. We propose incremental relational lenses, that equip relational lenses with change-propagating semantics that map small changes to the view to (potentially) small changes to the source tables. We also present a language-integrated implementation of relational lenses and a detailed experimental evaluation, showing orders of magnitude improvement over the non-incremental approach. Our work shows that relational lenses can be used to support expressive and efficient view updates at the language level, without relying on updatable view support from the underlying database.