ﻻ يوجد ملخص باللغة العربية
We study the similarity search problem which aims to find the similar query results according to a set of given data and a query string. To balance the result number and result quality, we combine query result diversity with query relaxation. Relaxation guarantees the number of the query results, returning more relevant elements to the query if the results are too few, while the diversity tries to reduce the similarity among the returned results. By making a trade-off of similarity and diversity, we improve the user experience. To achieve this goal, we define a novel goal function combining similarity and diversity. Aiming at this goal, we propose three algorithms. Among them, algorithms genGreedy and genCluster perform relaxation first and select part of the candidates to diversify. The third algorithm CB2S splits the dataset into smaller pieces using the clustering algorithm of k-means and processes queries in several small sets to retrieve more diverse results. The balance of similarity and diversity is determined through setting a threshold, which has a default value and can be adjusted according to users preference. The performance and efficiency of our system are demonstrated through extensive experiments based on various datasets.
Sample-based approximate query processing (AQP) suffers from many pitfalls such as the inability to answer very selective queries and unreliable confidence intervals when sample sizes are small. Recent research presented an intriguing solution of com
Traditional relational data interfaces require precise structured queries over potentially complex schemas. These rigid data retrieval mechanisms pose hurdles for non-expert users, who typically lack language expertise and are unfamiliar with the det
Graph similarity search algorithms usually leverage the structural properties of a database. Hence, these algorithms are effective only on some structural variations of the data and are ineffective on other forms, which makes them hard to use. Ideall
We present SLASH (Sketched LocAlity Sensitive Hashing), an MPI (Message Passing Interface) based distributed system for approximate similarity search over terabyte scale datasets. SLASH provides a multi-node implementation of the popular LSH (localit
Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Descript