We consider the problem of ranking $n$ players from partial pairwise comparison data under the Bradley-Terry-Luce model. For the first time in the literature, the minimax rate of this ranking problem is derived with respect to the Kendalls tau distan
ce that measures the difference between two rank vectors by counting the number of
Given a binary prediction problem, which performance metric should the classifier optimize? We address this question by formalizing the problem of Metric Elicitation. The goal of metric elicitation is to discover the performance metric of a practitio
ner, which reflects her innate rewards (costs) for correct (incorrect) classification. In particular, we focus on eliciting binary classification performance metrics from pairwise feedback, where a practitioner is queried to provide relative preference between two classifiers. By exploiting key geometric properties of the space of confusion matrices, we obtain provably query efficient algorithms for eliciting linear and linear-fractional performance metrics. We further show that our method is robust to feedback and finite sample noise.
As pairwise ranking becomes broadly employed for elections, sports competitions, recommendations, and so on, attackers have strong motivation and incentives to manipulate the ranking list. They could inject malicious comparisons into the training dat
a to fool the victim. Such a technique is called poisoning attack in regression and classification tasks. In this paper, to the best of our knowledge, we initiate the first systematic investigation of data poisoning attacks on pairwise ranking algorithms, which can be formalized as the dynamic and static games between the ranker and the attacker and can be modeled as certain kinds of integer programming problems. To break the computational hurdle of the underlying integer programming problems, we reformulate them into the distributionally robust optimization (DRO) problems, which are computationally tractable. Based on such DRO formulations, we propose two efficient poisoning attack algorithms and establish the associated theoretical guarantees. The effectiveness of the suggested poisoning attack strategies is demonstrated by a series of toy simulations and several real data experiments. These experimental results show that the proposed methods can significantly reduce the performance of the ranker in the sense that the correlation between the true ranking list and the aggregated results can be decreased dramatically.
To alleviate the data requirement for training effective binary classifiers in binary classification, many weakly supervised learning settings have been proposed. Among them, some consider using pairwise but not pointwise labels, when pointwise label
s are not accessible due to privacy, confidentiality, or security reasons. However, as a pairwise label denotes whether or not two data points share a pointwise label, it cannot be easily collected if either point is equally likely to be positive or negative. Thus, in this paper, we propose a novel setting called pairwise comparison (Pcomp) classification, where we have only pairs of unlabeled data that we know one is more likely to be positive than the other. Firstly, we give a Pcomp data generation process, derive an unbiased risk estimator (URE) with theoretical guarantee, and further improve URE using correction functions. Secondly, we link Pcomp classification to noisy-label learning to develop a progressive URE and improve it by imposing consistency regularization. Finally, we demonstrate by experiments the effectiveness of our methods, which suggests Pcomp is a valuable and practically useful type of pairwise supervision besides the pairwise label.
We propose a novel parameterized family of Mixed Membership Mallows Models (M4) to account for variability in pairwise comparisons generated by a heterogeneous population of noisy and inconsistent users. M4 models individual preferences as a user-spe
cific probabilistic mixture of shared latent Mallows components. Our key algorithmic insight for estimation is to establish a statistical connection between M4 and topic models by viewing pairwise comparisons as words, and users as documents. This key insight leads us to explore Mallows components with a separable structure and leverage recent advances in separable topic discovery. While separability appears to be overly restrictive, we nevertheless show that it is an inevitable outcome of a relatively small number of latent Mallows components in a world of large number of items. We then develop an algorithm based on robust extreme-point identification of convex polygons to learn the reference rankings, and is provably consistent with polynomial sample complexity guarantees. We demonstrate that our new model is empirically competitive with the current state-of-the-art approaches in predicting real-world preferences.