ﻻ يوجد ملخص باللغة العربية
Large-scale labeled datasets are the indispensable fuel that ignites the AI revolution as we see today. Most such datasets are constructed using crowdsourcing services such as Amazon Mechanical Turk which provides noisy labels from non-experts at a fair price. The sheer size of such datasets mandates that it is only feasible to collect a few labels per data point. We formulate the problem of test-time label aggregation as a statistical estimation problem of inferring the expected voting score in an ideal world where all workers label all items. By imitating workers with supervised learners and using them in a doubly robust estimation framework, we prove that the variance of estimation can be substantially reduced, even if the learner is a poor approximation. Synthetic and real-world experiments show that by combining the doubly robust approach with adaptive worker/item selection, we often need as low as 0.1 labels per data point to achieve nearly the same accuracy as in the ideal world where all workers label all data points.
Allowing members of the crowd to propose novel microtasks for one another is an effective way to combine the efficiencies of traditional microtask work with the inventiveness and hypothesis generation potential of human workers. However, microtask pr
We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method
While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a s
Modern machine learning is migrating to the era of complex models, which requires a plethora of well-annotated data. While crowdsourcing is a promising tool to achieve this goal, existing crowdsourcing approaches barely acquire a sufficient amount of
The main goal of this paper is to discuss how to integrate the possibilities of crowdsourcing platforms with systems supporting workflow to enable the engagement and interaction with business tasks of a wider group of people. Thus, this work is an at