Quantum-Inspired Keyword Search on Multi-Model Databases

83 0 0.0 ( 0 )

Download Cite

Added by Gongsheng Yuan

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Gongsheng Yuan - Jiaheng Lu - Peifeng Su

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

With the rising applications implemented in different domains, it is inevitable to require databases to adopt corresponding appropriate data models to store and exchange data derived from various sources. To handle these data models in a single platform, the community of databases introduces a multi-model database. And many vendors are improving their products from supporting a single data model to being multi-model databases. Although this brings benefits, spending lots of enthusiasm to master one of the multi-model query languages for exploring a database is unfriendly to most users. Therefore, we study using keyword searches as an alternative way to explore and query multi-model databases. In this paper, we attempt to utilize quantum physicss probabilistic formalism to bring the problem into vector spaces and represent events (e.g., words) as subspaces. Then we employ a density matrix to encapsulate all the information over these subspaces and use density matrices to measure the divergence between query and candidate answers for finding top-textit{k} the most relevant results. In this process, we propose using pattern mining to identify compounds for improving accuracy and using dimensionality reduction for reducing complexity. Finally, empirical experiments demonstrate the performance superiority of our approaches over the state-of-the-art approaches.

rate research

Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

420 - Ye Yuan , Guoren Wang , Lei Chen 2012

Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.

Databases

Anomaly Detection in Large Labeled Multi-Graph Databases

99 - Hung T. Nguyen , Pierre J. Liang , Leman Akoglu 2020

Within a large database G containing graphs with labeled nodes and directed, multi-edges; how can we detect the anomalous graphs? Most existing work are designed for plain (unlabeled) and/or simple (unweighted) graphs. We introduce CODETECT, the first approach that addresses the anomaly detection task for graph databases with such complex nature. To this end, it identifies a small representative set S of structural patterns (i.e., node-labeled network motifs) that losslessly compress database G as concisely as possible. Graphs that do not compress well are flagged as anomalous. CODETECT exhibits two novel building blocks: (i) a motif-based lossless graph encoding scheme, and (ii) fast memory-efficient search algorithms for S. We show the effectiveness of CODETECT on transaction graph databases from three different corporations, where existing baselines adjusted for the task fall behind significantly, across different types of anomalies and performance metrics.

Databases Artificial Intelligence Machine Learning

On Demand Memory Specialization for Distributed Graph Databases

587 - Xavier Martinez-Palau , David Dominguez-Sal , Reza Akbarinia 2013

In this paper, we propose the DN-tree that is a data structure to build lossy summaries of the frequent data access patterns of the queries in a distributed graph data management system. These compact representations allow us an efficient communication of the data structure in distributed systems. We exploit this data structure with a new textit{Dynamic Data Partitioning} strategy (DYDAP) that assigns the portions of the graph according to historical data access patterns, and guarantees a small network communication and a computational load balance in distributed graph queries. This method is able to adapt dynamically to new workloads and evolve when the query distribution changes. Our experiments show that DYDAP yields a throughput up to an order of magnitude higher than previous methods based on cache specialization, in a variety of scenarios, and the average response time of the system is divided by two.

Databases Distributed Parallel and Cluster Computing

Personal Information Databases

641 - Sabah S. Al-Fedaghi , Bernhard Thalheim 2009

One of the most important aspects of security organization is to establish a framework to identify security significant points where policies and procedures are declared. The (information) security infrastructure comprises entities, processes, and technology. All are participants in handling information, which is the item that needs to be protected. Privacy and security information technology is a critical and unmet need in the management of personal information. This paper proposes concepts and technologies for management of personal information. Two different types of information can be distinguished: personal information and nonpersonal information. Personal information can be either personal identifiable information (PII), or nonidentifiable information (NII). Security, policy, and technical requirements can be based on this distinction. At the conceptual level, PII is defined and formalized by propositions over infons (discrete pieces of information) that specify transformations in PII and NII. PII is categorized into simple infons that reflect the proprietor s aspects, relationships with objects, and relationships with other proprietors. The proprietor is the identified person about whom the information is communicated. The paper proposes a database organization that focuses on the PII spheres of proprietors. At the design level, the paper describes databases of personal identifiable information built exclusively for this type of information, with their own conceptual scheme, system management, and physical structure.

Databases

Multi-keyword multi-click advertisement option contracts for sponsored search

620 - Bowei Chen , Jun Wang , Ingemar J. Cox 2013

In sponsored search, advertisement (abbreviated ad) slots are usually sold by a search engine to an advertiser through an auction mechanism in which advertisers bid on keywords. In theory, auction mechanisms have many desirable economic properties. However, keyword auctions have a number of limitations including: the uncertainty in payment prices for advertisers; the volatility in the search engines revenue; and the weak loyalty between advertiser and search engine. In this paper we propose a special ad option that alleviates these problems. In our proposal, an advertiser can purchase an option from a search engine in advance by paying an upfront fee, known as the option price. He then has the right, but no obligation, to purchase among the pre-specified set of keywords at the fixed cost-per-clicks (CPCs) for a specified number of clicks in a specified period of time. The proposed option is closely related to a special exotic option in finance that contains multiple underlying assets (multi-keyword) and is also multi-exercisable (multi-click). This novel structure has many benefits: advertisers can have reduced uncertainty in advertising; the search engine can improve the advertisers loyalty as well as obtain a stable and increased expected revenue over time. Since the proposed ad option can be implemented in conjunction with the existing keyword auctions, the option price and corresponding fixed CPCs must be set such that there is no arbitrage between the two markets. Option pricing methods are discussed and our experimental results validate the development. Compared to keyword auctions, a search engine can have an increased expected revenue by selling an ad option.

Computer Science and Game Theory Information Retrieval