ﻻ يوجد ملخص باللغة العربية
Crowdsourcing employs human workers to solve computer-hard problems, such as data cleaning, entity resolution, and sentiment analysis. When crowdsourcing tabular data, e.g., the attribute values of an entity set, a workers answers on the different attributes (e.g., the nationality and age of a celebrity star) are often treated independently. This assumption is not always true and can lead to suboptimal crowdsourcing performance. In this paper, we present the T-Crowd system, which takes into consideration the intricate relationships among tasks, in order to converge faster to their true values. Particularly, T-Crowd integrates each workers answers on different attributes to effectively learn his/her trustworthiness and the true data values. The attribute relationship information is also used to guide task allocation to workers. Finally, T-Crowd seamlessly supports categorical and continuous attributes, which are the two main datatypes found in typical databases. Our extensive experiments on real and synthetic datasets show that T-Crowd outperforms state-of-the-art methods in terms of truth inference and reducing the cost of crowdsourcing.
Companies and individuals produce numerous tabular data. The objective of this position paper is to draw up the challenges posed by the automatic integration of data in the form of tables so that they can be cross-analyzed. We provide a first automat
Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are impli
In 2010, the concept of data lake emerged as an alternative to data warehouses for big data management. Data lakes follow a schema-on-read approach to provide rich and flexible analyses. However, although trendy in both the industry and academia, the
Allowing members of the crowd to propose novel microtasks for one another is an effective way to combine the efficiencies of traditional microtask work with the inventiveness and hypothesis generation potential of human workers. However, microtask pr
A common workflow to perform a continuous human task stream is to divide workers into groups, have one group perform the newly-arrived task, and rotate the groups. We call this type of workflow the group rotation. This paper addresses the problem of