ترغب بنشر مسار تعليمي؟ اضغط هنا

Multi-Label Annotation Aggregation in Crowdsourcing

69   0   0.0 ( 0 )
 نشر من قبل Junming Yin
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

As a means of human-based computation, crowdsourcing has been widely used to annotate large-scale unlabeled datasets. One of the obvious challenges is how to aggregate these possibly noisy labels provided by a set of heterogeneous annotators. Another challenge stems from the difficulty in evaluating the annotator reliability without even knowing the ground truth, which can be used to build incentive mechanisms in crowdsourcing platforms. When each instance is associated with many possible labels simultaneously, the problem becomes even harder because of its combinatorial nature. In this paper, we present new flexible Bayesian models and efficient inference algorithms for multi-label annotation aggregation by taking both annotator reliability and label dependency into account. Extensive experiments on real-world datasets confirm that the proposed methods outperform other competitive alternatives, and the model can recover the type of the annotators with high accuracy.



قيم البحث

اقرأ أيضاً

Annotated images are required for both supervised model training and evaluation in image classification. Manually annotating images is arduous and expensive, especially for multi-labeled images. A recent trend for conducting such laboursome annotatio n tasks is through crowdsourcing, where images are annotated by volunteers or paid workers online (e.g., workers of Amazon Mechanical Turk) from scratch. However, the quality of crowdsourcing image annotations cannot be guaranteed, and incompleteness and incorrectness are two major concerns for crowdsourcing annotations. To address such concerns, we have a rethinking of crowdsourcing annotations: Our simple hypothesis is that if the annotators only partially annotate multi-label images with salient labels they are confident in, there will be fewer annotation errors and annotators will spend less time on uncertain labels. As a pleasant surprise, with the same annotation budget, we show a multi-label image classifier supervised by images with salient annotations can outperform models supervised by fully annotated images. Our method contributions are 2-fold: An active learning way is proposed to acquire salient labels for multi-label images; and a novel Adaptive Temperature Associated Model (ATAM) specifically using partial annotations is proposed for multi-label image classification. We conduct experiments on practical crowdsourcing data, the Open Street Map (OSM) dataset and benchmark dataset COCO 2014. When compared with state-of-the-art classification methods trained on fully annotated images, the proposed ATAM can achieve higher accuracy. The proposed idea is promising for crowdsourcing data annotation. Our code will be publicly available.
In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Considering that the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly.
Disaster monitoring is challenging due to the lake of infrastructures in monitoring areas. Based on the theory of Game-With-A-Purpose (GWAP), this paper contributes to a novel large-scale crowdsourcing disaster monitoring system. The system analyzes tagged satellite pictures from anonymous players, and then reports aggregated and evaluated monitoring results to its stakeholders. An algorithm based on directed graph centralities is presented to address the core issues of malicious user detection and disaster level calculation. Our method can be easily applied in other human computation systems. In the end, some issues with possible solutions are discussed for our future work.
In this paper, we present empirical analysis on basic and depression specific multi-emotion mining in Tweets with the help of state of the art multi-label classifiers. We choose our basic emotions from a hybrid emotion model consisting of the common emotions from four highly regarded psychological models of emotions. Moreover, we augment that emotion model with new emotion categories because of their importance in the analysis of depression. Most of those additional emotions have not been used in previous emotion mining research. Our experimental analyses show that a cost sensitive RankSVM algorithm and a Deep Learning model are both robust, measured by both Macro F-measures and Micro F-measures. This suggests that these algorithms are superior in addressing the widely known data imbalance problem in multi-label learning. Moreover, our application of Deep Learning performs the best, giving it an edge in modeling deep semantic features of our extended emotional categories.
One of the key problems in multi-label text classification is how to take advantage of the correlation among labels. However, it is very challenging to directly model the correlations among labels in a complex and unknown label space. In this paper, we propose a Label Mask multi-label text classification model (LM-MTC), which is inspired by the idea of cloze questions of language model. LM-MTC is able to capture implicit relationships among labels through the powerful ability of pre-train language models. On the basis, we assign a different token to each potential label, and randomly mask the token with a certain probability to build a label based Masked Language Model (MLM). We train the MTC and MLM together, further improving the generalization ability of the model. A large number of experiments on multiple datasets demonstrate the effectiveness of our method.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا