Optimal Stopping and Worker Selection in Crowdsourcing: an Adaptive Sequential Probability Ratio Test Framework


Abstract in English

In this paper, we aim at solving a class of multiple testing problems under the Bayesian sequential decision framework. Our motivating application comes from binary labeling tasks in crowdsourcing, where the requestor needs to simultaneously decide which worker to choose to provide the label and when to stop collecting labels under a certain budget constraint. We start with the binary hypothesis testing problem to determine the true label of a single object, and provide an optimal solution by casting it under the adaptive sequential probability ratio test (Ada-SPRT) framework. We characterize the structure of the optimal solution, i.e., optimal adaptive sequential design, which minimizes the Bayes risk through log-likelihood ratio statistic. We also develop a dynamic programming algorithm that can efficiently approximate the optimal solution. For the multiple testing problem, we further propose to adopt an empirical Bayes approach for estimating class priors and show that our method has an averaged loss that converges to the minimal Bayes risk under the true model. The experiments on both simulated and real data show the robustness of our method and its superiority in labeling accuracy as compared to several other recently proposed approaches.

Download