ﻻ يوجد ملخص باللغة العربية
Selecting the best items in a dataset is a common task in data exploration. However, the concept of best lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove dominated items and create a representative subset of the data set, comprising the best items in it. A Pareto-optimal representative is guaranteed to contain the best item of each possible ranking, but it can be almost as big as the full data. Representative can be found if we relax the requirement to include the best item for every possible user, and instead just limit the users regret. Existing work defines regret as the loss in score by limiting consideration to the representative instead of the full data set, for any chosen ranking function. However, the score is often not a meaningful number and users may not understand its absolute value. Sometimes small ranges in score can include large fractions of the data set. In contrast, users do understand the notion of rank ordering. Therefore, alternatively, we consider the position of the items in the ranked list for defining the regret and propose the {em rank-regret representative} as the minimal subset of the data containing at least one of the top-$k$ of any possible ranking function. This problem is NP-complete. We use the geometric interpretation of items to bound their ranks on ranges of functions and to utilize combinatorial geometry notions for developing effective and efficient approximation algorithms for the problem. Experiments on real datasets demonstrate that we can efficiently find small subsets with small rank-regrets.
The k-regret query aims to return a size-k subset S of a database D such that, for any query user that selects a data object from this size-k subset S rather than from database D, her regret ratio is minimized. The regret ratio here is modeled by the
The insights revealed from process mining heavily rely on the quality of event logs. Activities extracted from healthcare information systems with the free-text nature may lead to inconsistent labels. Such inconsistency would then lead to redundancy
Automating physical database design has remained a long-term interest in database research due to substantial performance gains afforded by optimised structures. Despite significant progress, a majority of todays commercial solutions are highly manua
A population of voters must elect representatives among themselves to decide on a sequence of possibly unforeseen binary issues. Voters care only about the final decision, not the elected representatives. The disutility of a voter is proportional to
Non-local operation is widely explored to model the long-range dependencies. However, the redundant computation in this operation leads to a prohibitive complexity. In this paper, we present a Representative Graph (RepGraph) layer to dynamically samp