ﻻ يوجد ملخص باللغة العربية
Top-k query processing finds a list of k results that have largest scores w.r.t the user given query, with the assumption that all the k results are independent to each other. In practice, some of the top-k results returned can be very similar to each other. As a result some of the top-k results returned are redundant. In the literature, diversified top-k search has been studied to return k results that take both score and diversity into consideration. Most existing solutions on diversified top-k search assume that scores of all the search results are given, and some works solve the diversity problem on a specific problem and can hardly be extended to general cases. In this paper, we study the diversified top-k search problem. We define a general diversified top-k search problem that only considers the similarity of the search results themselves. We propose a framework, such that most existing solutions for top-k query processing can be extended easily to handle diversified top-k search, by simply applying three new functions, a sufficient stop condition sufficient(), a necessary stop condition necessary(), and an algorithm for diversified top-k search on the current set of generated results, div-search-current(). We propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve the div-search-current() problem. div-astar is an A* based algorithm, div-dp is an algorithm that decomposes the results into components which are searched using div-astar independently and combined using dynamic programming. div-cut further decomposes the current set of generated results using cut points and combines the results using sophisticated operations. We conducted extensive performance studies using two real datasets, enwiki and reuters. Our div-cut algorithm finds the optimal solution for diversified top-k search problem in seconds even for k as large as 2,000.
Betweenness centrality, measured by the number of times a vertex occurs on all shortest paths of a graph, has been recognized as a key indicator for the importance of a vertex in the network. However, the betweenness of a vertex is often very hard to
Utility-driven itemset mining is widely applied in many real-world scenarios. However, most algorithms do not work for itemsets with negative utilities. Several efficient algorithms for high-utility itemset (HUI) mining with negative utilities have b
Episode discovery from an event is a popular framework for data mining tasks and has many real-world applications. An episode is a partially ordered set of objects (e.g., item, node), and each object is associated with an event type. This episode can
High-utility sequential pattern mining (HUSPM) has recently emerged as a focus of intense research interest. The main task of HUSPM is to find all subsequences, within a quantitative sequential database, that have high utility with respect to a user-
With the prevalence of social media and GPS-enabled devices, a massive amount of geo-textual data has been generated in a stream fashion, leading to a variety of applications such as location-based recommendation and information dissemination. In thi