T-Crowd: Effective Crowdsourcing for Tabular Data

127 0 0.0 ( 0 )

Download Cite

Added by Caihua Shan

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Caihua Shan - Nikos Mamoulis - Guoliang Li

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Crowdsourcing employs human workers to solve computer-hard problems, such as data cleaning, entity resolution, and sentiment analysis. When crowdsourcing tabular data, e.g., the attribute values of an entity set, a workers answers on the different attributes (e.g., the nationality and age of a celebrity star) are often treated independently. This assumption is not always true and can lead to suboptimal crowdsourcing performance. In this paper, we present the T-Crowd system, which takes into consideration the intricate relationships among tasks, in order to converge faster to their true values. Particularly, T-Crowd integrates each workers answers on different attributes to effectively learn his/her trustworthiness and the true data values. The attribute relationship information is also used to guide task allocation to workers. Finally, T-Crowd seamlessly supports categorical and continuous attributes, which are the two main datatypes found in typical databases. Our extensive experiments on real and synthetic datasets show that T-Crowd outperforms state-of-the-art methods in terms of truth inference and reducing the cost of crowdsourcing.

rate research

Automatic Integration Issues of Tabular Data for On-Line Analysis Processing

127 - Yuzhao Yang 2020

Companies and individuals produce numerous tabular data. The objective of this position paper is to draw up the challenges posed by the automatic integration of data in the form of tables so that they can be cross-analyzed. We provide a first automatic solution for the integration of such tabular data to allow On-Line Analysis Processing. To fulfil this task, features of tabular data should be analyzed and the challenge of automatic multidimensional schema generation should be addressed. Hence, we propose a typology of tabular data and discuss our idea of an automatic solution.

Databases

RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query

86 - Honghui Mei , Wei Chen , Yating Wei 2019

Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are implicitly executed through such a process. Datasets are constantly extremely large; thus, the response time should be accelerated by calculating predefined data cubes. However, the queries are limited to the predefined binning schema of preprocessed data cubes. Such limitation hinders analysts flexible adjustment of visual specifications to investigate the implicit patterns in the data effectively. Particularly, RSATree enables arbitrary queries and flexible binning strategies by leveraging three schemes, namely, an R-tree-based space partitioning scheme to catch the data distribution, a locality-sensitive hashing technique to achieve locality-preserving random access to data items, and a summed area table scheme to support interactive query of aggregated values with a linear computational complexity. This study presents and implements a web-based visual query system that supports visual specification, query, and exploration of large-scale tabular data with user-adjustable granularities. We demonstrate the efficiency and utility of our approach by performing various experiments on real-world datasets and analyzing time and space complexity.

Databases Human-Computer Interaction

Joint Management and Analysis of Textual Documents and Tabular Data within the AUDAL Data Lake

117 - Pegdwende Sawadogo , Camille No^us 2021

In 2010, the concept of data lake emerged as an alternative to data warehouses for big data management. Data lakes follow a schema-on-read approach to provide rich and flexible analyses. However, although trendy in both the industry and academia, the concept of data lake is still maturing, and there are still few methodological approaches to data lake design. Thus, we introduce a new approach to design a data lake and propose an extensive metadata system to activate richer features than those usually supported in data lake approaches. We implement our approach in the AUDAL data lake, where we jointly exploit both textual documents and tabular data, in contrast with structured and/or semi-structured data typically processed in data lakes from the literature. Furthermore, we also innovate by leveraging metadata to activate both data retrieval and content analysis, including Text-OLAP and SQL querying. Finally, we show the feasibility of our approach using a real-word use case on the one hand, and a benchmark on the other hand.

Databases

Efficient crowdsourcing of crowd-generated microtasks

53 - Abigail Hotaling , James P. Bagrow 2019

Allowing members of the crowd to propose novel microtasks for one another is an effective way to combine the efficiencies of traditional microtask work with the inventiveness and hypothesis generation potential of human workers. However, microtask proposal leads to a growing set of tasks that may overwhelm limited crowdsourcer resources. Crowdsourcers can employ methods to utilize their resources efficiently, but algorithmic approaches to efficient crowdsourcing generally require a fixed task set of known size. In this paper, we introduce *cost forecasting* as a means for a crowdsourcer to use efficient crowdsourcing algorithms with a growing set of microtasks. Cost forecasting allows the crowdsourcer to decide between eliciting new tasks from the crowd or receiving responses to existing tasks based on whether or not new tasks will cost less to complete than existing tasks, efficiently balancing resources as crowdsourcing occurs. Experiments with real and synthetic crowdsourcing data show that cost forecasting leads to improved accuracy. Accuracy and efficiency gains for crowd-generated microtasks hold the promise to further leverage the creativity and wisdom of the crowd, with applications such as generating more informative and diverse training data for machine learning applications and improving the performance of user-generated content and question-answering platforms.

Human-Computer Interaction Machine Learning Applications

Group Rotation Type Crowdsourcing

104 - Katsumi Kumai , Yuhki Shiraishi , Jianwei Zhang 2016

A common workflow to perform a continuous human task stream is to divide workers into groups, have one group perform the newly-arrived task, and rotate the groups. We call this type of workflow the group rotation. This paper addresses the problem of how to manage Group Rotation Type Crowdsourcing, the group rotation in a crowdsourcing setting. In the group-rotation type crowdsourcing, we must change the group structure dynamically because workers come in and leave frequently. This paper proposes an approach to explore a design space of methods for group restructuring in the group rotation type crowdsourcing.

Databases

comments

Fetching comments

Alshahba Private University

Additional details More universities

T-Crowd: Effective Crowdsourcing for Tabular Data

Ask ChatGPT about the research

No Arabic abstract

Read More