No Arabic abstract
Although cancer patients survive years after oncologic therapy, they are plagued with long-lasting or permanent residual symptoms, whose severity, rate of development, and resolution after treatment vary largely between survivors. The analysis and interpretation of symptoms is complicated by their partial co-occurrence, variability across populations and across time, and, in the case of cancers that use radiotherapy, by further symptom dependency on the tumor location and prescribed treatment. We describe THALIS, an environment for visual analysis and knowledge discovery from cancer therapy symptom data, developed in close collaboration with oncology experts. Our approach leverages unsupervised machine learning methodology over cohorts of patients, and, in conjunction with custom visual encodings and interactions, provides context for new patients based on patients with similar diagnostic features and symptom evolution. We evaluate this approach on data collected from a cohort of head and neck cancer patients. Feedback from our clinician collaborators indicates that THALIS supports knowledge discovery beyond the limits of machines or humans alone, and that it serves as a valuable tool in both the clinic and symptom research.
The growing use of automated decision-making in critical applications, such as crime prediction and college admission, has raised questions about fairness in machine learning. How can we decide whether different treatments are reasonable or discriminatory? In this paper, we investigate discrimination in machine learning from a visual analytics perspective and propose an interactive visualization tool, DiscriLens, to support a more comprehensive analysis. To reveal detailed information on algorithmic discrimination, DiscriLens identifies a collection of potentially discriminatory itemsets based on causal modeling and classification rules mining. By combining an extended Euler diagram with a matrix-based visualization, we develop a novel set visualization to facilitate the exploration and interpretation of discriminatory itemsets. A user study shows that users can interpret the visually encoded information in DiscriLens quickly and accurately. Use cases demonstrate that DiscriLens provides informative guidance in understanding and reducing algorithmic discrimination.
Understanding what graph layout human prefer and why they prefer is significant and challenging due to the highly complex visual perception and cognition system in human brain. In this paper, we present the first machine learning approach for predicting human preference for graph layouts. In general, the data sets with human preference labels are limited and insufficient for training deep networks. To address this, we train our deep learning model by employing the transfer learning method, e.g., exploiting the quality metrics, such as shape-based metrics, edge crossing and stress, which are shown to be correlated to human preference on graph layouts. Experimental results using the ground truth human preference data sets show that our model can successfully predict human preference for graph layouts. To our best knowledge, this is the first approach for predicting qualitative evaluation of graph layouts using human preference experiment data.
An important step towards enabling English language learners to improve their conversational speaking proficiency involves automated scoring of multiple aspects of interactional competence and subsequent targeted feedback. This paper builds on previous work in this direction to investigate multiple neural architectures -- recurrent, attention and memory based -- along with feature-engineered models for the automated scoring of interactional and topic development aspects of text dialog data. We conducted experiments on a conversational database of text dialogs from human learners interacting with a cloud-based dialog system, which were triple-scored along multiple dimensions of conversational proficiency. We find that fusion of multiple architectures performs competently on our automated scoring task relative to expert inter-rater agreements, with (i) hand-engineered features passed to a support vector learner and (ii) transformer-based architectures contributing most prominently to the fusion.
When crowdsourcing systems are used in combination with machine inference systems in the real world, they benefit the most when the machine system is deeply integrated with the crowd workers. However, if researchers wish to integrate the crowd with off-the-shelf machine classifiers, this deep integration is not always possible. This work explores two strategies to increase accuracy and decrease cost under this setting. First, we show that reordering tasks presented to the human can create a significant accuracy improvement. Further, we show that greedily choosing parameters to maximize machine accuracy is sub-optimal, and joint optimization of the combined system improves performance.
Over the past decade, Citizen Science has become a proven method of distributed data analysis, enabling research teams from diverse domains to solve problems involving large quantities of data with complexity levels which require human pattern recognition capabilities. With over 120 projects built reaching nearly 1.7 million volunteers, the Zooniverse.org platform has led the way in the application of Citizen Science as a method for closing the Big Data analysis gap. Since the launch in 2007 of the Galaxy Zoo project, the Zooniverse platform has enabled significant contributions across many disciplines; e.g., in ecology, humanities, and astronomy. Citizen science as an approach to Big Data combines the twin advantages of the ability to scale analysis to the size of modern datasets with the ability of humans to make serendipitous discoveries. To cope with the larger datasets looming on the horizon such as astronomys Large Synoptic Survey Telescope (LSST) or the 100s of TB from ecology projects annually, Zooniverse has been researching a system design that is optimized for efficiency in task assignment and incorporating human and machine classifiers into the classification engine. By making efficient use of smart task assignment and the combination of human and machine classifiers, we can achieve greater accuracy and flexibility than has been possible to date. We note that creating the most efficient system must consider how best to engage and retain volunteers as well as make the most efficient use of their classifications. Our work thus focuses on understanding the factors that optimize efficiency of the combined human-machine system. This paper summarizes some of our research to date on integration of machine learning with Zooniverse, while also describing new infrastructure developed on the Zooniverse platform to carry out this research.