No Arabic abstract
Recent advances in multi-task peer prediction have greatly expanded our knowledge about the power of multi-task peer prediction mechanisms. Various mechanisms have been proposed in different settings to elicit different types of information. But we still lack understanding about when desirable mechanisms will exist for a multi-task peer prediction problem. In this work, we study the elicitability of multi-task peer prediction problems. We consider a designer who has certain knowledge about the underlying information structure and wants to elicit certain information from a group of participants. Our goal is to infer the possibility of having a desirable mechanism based on the primitives of the problem. Our contribution is twofold. First, we provide a characterization of the elicitable multi-task peer prediction problems, assuming that the designer only uses scoring mechanisms. Scoring mechanisms are the mechanisms that reward participants reports for different tasks separately. The characterization uses a geometric approach based on the power diagram characterization in the single-task setting ([Lambert and Shoham, 2009, Frongillo and Witkowski, 2017]). For general mechanisms, we also give a necessary condition for a multi-task problem to be elicitable. Second, we consider the case when the designer aims to elicit some properties that are linear in the participants posterior about the state of the world. We first show that in some cases, the designer basically can only elicit the posterior itself. We then look into the case when the designer aims to elicit the participants posteriors. We give a necessary condition for the posterior to be elicitable. This condition implies that the mechanisms proposed by Kong and Schoenebeck are already the best we can hope for in their setting, in the sense that their mechanisms can solve any problem instance that can possibly be elicitable.
In the setting where we ask participants multiple similar possibly subjective multi-choice questions (e.g. Do you like Bulbasaur? Y/N; do you like Squirtle? Y/N), peer prediction aims to design mechanisms that encourage honest feedback without verification. A series of works have successfully designed multi-task peer prediction mechanisms where reporting truthfully is better than any other strategy (dominantly truthful), while they require an infinite number of tasks. A recent work proposes the first multi-task peer prediction mechanism, Determinant Mutual Information (DMI)-Mechanism, where not only is dominantly truthful but also works for a finite number of tasks (practical). However, few works consider how to optimize the multi-task peer prediction mechanisms. In addition to the definition of optimization goal, the biggest challenge is we do not have space for optimization since there is only a single practical and dominantly truthful mechanism. This work addresses this problem by proposing a tractable effort incentive optimization goal and generalizing DMI-Mechanism to a new family of practical, dominantly truthful mechanisms, Volume Mutual Information (VMI)-Mechanisms. We show that DMI-Mechanism may not be optimal. But we can construct a sequence of VMI-Mechanisms that are approximately optimal. The main technical tool is a novel family of mutual information measures, Volume Mutual Information, which generalizes Determinant Mutual Information. We construct VMI by a simple geometric idea: we measure how informative a distribution is by measuring the volume of distributions that is less informative than it (inappropriately, its similar to measuring how clever a person is by counting the number of people that are less clever than he/she).
Peer prediction mechanisms incentivize agents to truthfully report their signals even in the absence of verification by comparing agents reports with those of their peers. In the detail-free multi-task setting, agents respond to multiple independent and identically distributed tasks, and the mechanism does not know the prior distribution of agents signals. The goal is to provide an $epsilon$-strongly truthful mechanism where truth-telling rewards agents strictly more than any other strategy profile (with $epsilon$ additive error), and to do so while requiring as few tasks as possible. We design a family of mechanisms with a scoring function that maps a pair of reports to a score. The mechanism is strongly truthful if the scoring function is prior ideal, and $epsilon$-strongly truthful as long as the scoring function is sufficiently close to the ideal one. This reduces the above mechanism design problem to a learning problem -- specifically learning an ideal scoring function. We leverage this reduction to obtain the following three results. 1) We show how to derive good bounds on the number of tasks required for different types of priors. Our reduction applies to myriad continuous signal space settings. This is the first peer-prediction mechanism on continuous signals designed for the multi-task setting. 2) We show how to turn a soft-predictor of an agents signals (given the other agents signals) into a mechanism. This allows the practical use of machine learning algorithms that give good results even when many agents provide noisy information. 3) For finite signal spaces, we obtain $epsilon$-strongly truthful mechanisms on any stochastically relevant prior, which is the maximal possible prior. In contrast, prior work only achieves a weaker notion of truthfulness (informed truthfulness) or requires stronger assumptions on the prior.
We propose measurement integrity, a property related to ex post reward fairness, as a novel desideratum for peer prediction mechanisms in many applications, including peer assessment. We operationalize this notion to evaluate the measurement integrity of different mechanisms in computational experiments. Our evaluations simulate the application of peer prediction mechanisms to peer assessment---a setting in which realistic models have been validated on real data and in which ex post fairness concerns are quite salient. We find that peer prediction mechanisms, as proposed in the literature, largely fail to demonstrate measurement integrity in our experiments. However, we also find that certain mechanisms can be supplemented with realistic parametric statistical models to improve their measurement integrity. In the same setting, we also evaluate an empirical notion of robustness against strategic behavior to complement the theoretical analyses of robustness against strategic behavior that have been the main focus of the peer prediction literature. In this dimension of analysis, we again find that supplementing certain mechanisms with parametric statistical models can improve their empirical performance. Even so, though, we find that theoretical guarantees of robustness against strategic behavior are somewhat noisy predictors of empirical robustness. As a whole, our empirical methodology for quantifying desirable mechanism properties facilitates a more nuanced comparison between mechanisms than theoretical analysis alone. Ultimately, we find there is a trade-off between our two dimensions of analysis. The best performing mechanisms for measurement integrity are highly susceptible to strategic behavior. On the other hand, certain parametric peer prediction mechanisms are robust against all the strategic manipulations we consider while still achieving reasonable measurement integrity.
We consider the problem of purchasing data for machine learning or statistical estimation. The data analyst has a budget to purchase datasets from multiple data providers. She does not have any test data that can be used to evaluate the collected data and can assign payments to data providers solely based on the collected datasets. We consider the problem in the standard Bayesian paradigm and in two settings: (1) data are only collected once; (2) data are collected repeatedly and each days data are drawn independently from the same distribution. For both settings, our mechanisms guarantee that truthfully reporting ones dataset is always an equilibrium by adopting techniques from peer prediction: pay each provider the mutual information between his reported data and other providers reported data. Depending on the data distribution, the mechanisms can also discourage misreports that would lead to inaccurate predictions. Our mechanisms also guarantee individual rationality and budget feasibility for certain underlying distributions in the first setting and for all distributions in the second setting.
In many settings, an effective way of evaluating objects of interest is to collect evaluations from dispersed individuals and to aggregate these evaluations together. Some examples are categorizing online content and evaluating student assignments via peer grading. For this data science problem, one challenge is to motivate participants to conduct such evaluations carefully and to report them honestly, particularly when doing so is costly. Existing approaches, notably peer-prediction mechanisms, can incentivize truth telling in equilibrium. However, they also give rise to equilibria in which agents do not pay the costs required to evaluate accurately, and hence fail to elicit useful information. We show that this problem is unavoidable whenever agents are able to coordinate using low-cost signals about the items being evaluated (e.g., text labels or pictures). We then consider ways of circumventing this problem by comparing agents reports to ground truth, which is available in practice when there exist trusted evaluators---such as teaching assistants in the peer grading scenario---who can perform a limited number of unbiased (but noisy) evaluations. Of course, when such ground truth is available, a simpler approach is also possible: rewarding each agent based on agreement with ground truth with some probability, and unconditionally rewarding the agent otherwise. Surprisingly, we show that the simpler mechanism achieves stronger incentive guarantees given less access to ground truth than a large set of peer-prediction mechanisms.