No Arabic abstract
Accurately and efficiently crowdsourcing complex, open-ended tasks can be difficult, as crowd participants tend to favor short, repetitive microtasks. We study the crowdsourcing of large networks where the crowd provides the network topology via microtasks. Crowds can explore many types of social and information networks, but we focus on the network of causal attributions, an important network that signifies cause-and-effect relationships. We conduct experiments on Amazon Mechanical Turk (AMT) testing how workers propose and validate individual causal relationships and introduce a method for independent crowd workers to explore large networks. The core of the method, Iterative Pathway Refinement, is a theoretically-principled mechanism for efficient exploration via microtasks. We evaluate the method using synthetic networks and apply it on AMT to extract a large-scale causal attribution network, then investigate the structure of this network as well as the activity patterns and efficiency of the workers who constructed this network. Worker interactions reveal important characteristics of causal perception and the network data they generate can improve our understanding of causality and causal inference.
Allowing members of the crowd to propose novel microtasks for one another is an effective way to combine the efficiencies of traditional microtask work with the inventiveness and hypothesis generation potential of human workers. However, microtask proposal leads to a growing set of tasks that may overwhelm limited crowdsourcer resources. Crowdsourcers can employ methods to utilize their resources efficiently, but algorithmic approaches to efficient crowdsourcing generally require a fixed task set of known size. In this paper, we introduce *cost forecasting* as a means for a crowdsourcer to use efficient crowdsourcing algorithms with a growing set of microtasks. Cost forecasting allows the crowdsourcer to decide between eliciting new tasks from the crowd or receiving responses to existing tasks based on whether or not new tasks will cost less to complete than existing tasks, efficiently balancing resources as crowdsourcing occurs. Experiments with real and synthetic crowdsourcing data show that cost forecasting leads to improved accuracy. Accuracy and efficiency gains for crowd-generated microtasks hold the promise to further leverage the creativity and wisdom of the crowd, with applications such as generating more informative and diverse training data for machine learning applications and improving the performance of user-generated content and question-answering platforms.
Cause-and-effect reasoning, the attribution of effects to causes, is one of the most powerful and unique skills humans possess. Multiple surveys are mapping out causal attributions as networks, but it is unclear how well these efforts can be combined. Further, the total size of the collective causal attribution network held by humans is currently unknown, making it challenging to assess the progress of these surveys. Here we study three causal attribution networks to determine how well they can be combined into a single network. Combining these networks requires dealing with ambiguous nodes, as nodes represent written descriptions of causes and effects and different descriptions may exist for the same concept. We introduce NetFUSES, a method for combining networks with ambiguous nodes. Crucially, treating the different causal attributions networks as independent samples allows us to use their overlap to estimate the total size of the collective causal attribution network. We find that existing surveys capture 5.77% $pm$ 0.781% of the $approx$293 000 causes and effects estimated to exist, and 0.198% $pm$ 0.174% of the $approx$10 200 000 attributed cause-effect relationships.
Understanding demand-side energy behaviour is critical for making efficiency responses for energy demand management. We worked closely with energy experts and identified the key elements of the energy demand problem including temporal and spatial demand and shifts in spatiotemporal demand. To our knowledge, no previous research has investigated the shifts in spatiotemporal demand. To fill this research gap, we propose a unified visual analytics approach to support exploratory demand analysis; we developed E3, a highly interactive tool that support users in making and verifying hypotheses through human-client-server interactions. A novel potential flow based approach was formalized to model shifts in energy demand and integrated into a server-side engine. Experts then evaluated and affirmed the usefulness of this approach through case studies of real-world electricity data. In the future, we will improve the modelling algorithm, enhance visualisation, and expand the process to support more forms of energy data.
Despite the increasingly important role played by image memes, we do not yet have a solid understanding of the elements that might make a meme go viral on social media. In this paper, we investigate what visual elements distinguish image memes that are highly viral on social media from those that do not get re-shared, across three dimensions: composition, subjects, and target audience. Drawing from research in art theory, psychology, marketing, and neuroscience, we develop a codebook to characterize image memes, and use it to annotate a set of 100 image memes collected from 4chans Politically Incorrect Board (/pol/). On the one hand, we find that highly viral memes are more likely to use a close-up scale, contain characters, and include positive or negative emotions. On the other hand, image memes that do not present a clear subject the viewer can focus attention on, or that include long text are not likely to be re-shared by users. We train machine learning models to distinguish between image memes that are likely to go viral and those that are unlikely to be re-shared, obtaining an AUC of 0.866 on our dataset. We also show that the indicators of virality identified by our model can help characterize the most viral memes posted on mainstream online social networks too, as our classifiers are able to predict 19 out of the 20 most popular image memes posted on Twitter and Reddit between 2016 and 2018. Overall, our analysis sheds light on what indicators characterize viral and non-viral visual content online, and set the basis for developing better techniques to create or moderate content that is more likely to catch the viewers attention.
Data credibility is a crucial issue in mobile crowd sensing (MCS) and, more generally, people-centric Internet of Things (IoT). Prior work takes approaches such as incentive mechanism design and data mining to address this issue, while overlooking the power of crowds itself, which we exploit in this paper. In particular, we propose a cross validation approach which seeks a validating crowd to verify the data credibility of the original sensing crowd, and uses the verification result to reshape the original sensing dataset into a more credible posterior belief of the ground truth. Following this approach, we design a specific cross validation mechanism, which integrates four sampling techniques with a privacy-aware competency-adaptive push (PACAP) algorithm and is applicable to time-sensitive and quality-critical MCS applications. It does not require redesigning a new MCS system but rather functions as a lightweight plug-in, making it easier for practical adoption. Our results demonstrate that the proposed mechanism substantially improves data credibility in terms of both reinforcing obscure truths and scavenging hidden truths.