Optimizing the Human-Machine Partnership with Zooniverse


Abstract in English

Over the past decade, Citizen Science has become a proven method of distributed data analysis, enabling research teams from diverse domains to solve problems involving large quantities of data with complexity levels which require human pattern recognition capabilities. With over 120 projects built reaching nearly 1.7 million volunteers, the Zooniverse.org platform has led the way in the application of Citizen Science as a method for closing the Big Data analysis gap. Since the launch in 2007 of the Galaxy Zoo project, the Zooniverse platform has enabled significant contributions across many disciplines; e.g., in ecology, humanities, and astronomy. Citizen science as an approach to Big Data combines the twin advantages of the ability to scale analysis to the size of modern datasets with the ability of humans to make serendipitous discoveries. To cope with the larger datasets looming on the horizon such as astronomys Large Synoptic Survey Telescope (LSST) or the 100s of TB from ecology projects annually, Zooniverse has been researching a system design that is optimized for efficiency in task assignment and incorporating human and machine classifiers into the classification engine. By making efficient use of smart task assignment and the combination of human and machine classifiers, we can achieve greater accuracy and flexibility than has been possible to date. We note that creating the most efficient system must consider how best to engage and retain volunteers as well as make the most efficient use of their classifications. Our work thus focuses on understanding the factors that optimize efficiency of the combined human-machine system. This paper summarizes some of our research to date on integration of machine learning with Zooniverse, while also describing new infrastructure developed on the Zooniverse platform to carry out this research.

Download