No Arabic abstract
Researchers currently rely on ad hoc datasets to train automated visualization tools and evaluate the effectiveness of visualization designs. These exemplars often lack the characteristics of real-world datasets, and their one-off nature makes it difficult to compare different techniques. In this paper, we present VizNet: a large-scale corpus of over 31 million datasets compiled from open data repositories and online visualization galleries. On average, these datasets comprise 17 records over 3 dimensions and across the corpus, we find 51% of the dimensions record categorical data, 44% quantitative, and only 5% temporal. VizNet provides the necessary common baseline for comparing visualization design techniques, and developing benchmark models and algorithms for automating visual analysis. To demonstrate VizNets utility as a platform for conducting online crowdsourced experiments at scale, we replicate a prior study assessing the influence of user task and data distribution on visual encoding effectiveness, and extend it by considering an additional task: outlier detection. To contend with running such studies at scale, we demonstrate how a metric of perceptual effectiveness can be learned from experimental results, and show its predictive power across test datasets.
Background: Plant breeders are utilising an increasingly diverse range of data types in order to identify lines that have desirable characteristics which are suitable to be taken forward in plant breeding programmes. There are a number of key morphological and physiological traits such as disease resistance and yield that are required to be maintained, and improved upon if a commercial variety is to be successful. Computational tools that provide the ability to pull this data together, and integrate with pedigree structure, will enable breeders to make better decisions on which plant lines are used in crossings to meet both critical demands for increased yield/production and adaptation to climate change. Results: We have used a large and unique set of experimental barley (H. vulgare) data to develop a prototype pedigree visualization system and performed a subjective user evaluation with domain experts to guide and direct the development of an interactive pedigree visualization tool which we have called Helium. Conclusions: We show that Helium allows users to easily integrate a number of data types along with large plant pedigrees to offer an integrated environment in which they can explore pedigree data. We have also verified that users were happy with the abstract representation of pedigrees that we have used in our visualization tool.
Data visualization should be accessible for all analysts with data, not just the few with technical expertise. Visualization recommender systems aim to lower the barrier to exploring basic visualizations by automatically generating results for analysts to search and select, rather than manually specify. Here, we demonstrate a novel machine learning-based approach to visualization recommendation that learns visualization design choices from a large corpus of datasets and associated visualizations. First, we identify five key design choices made by analysts while creating visualizations, such as selecting a visualization type and choosing to encode a column along the X- or Y-axis. We train models to predict these design choices using one million dataset-visualization pairs collected from a popular online visualization platform. Neural networks predict these design choices with high accuracy compared to baseline models. We report and interpret feature importances from one of these baseline models. To evaluate the generalizability and uncertainty of our approach, we benchmark with a crowdsourced test set, and show that the performance of our model is comparable to human performance when predicting consensus visualization type, and exceeds that of other ML-based systems.
Utilizing Visualization-oriented Natural Language Interfaces (V-NLI) as a complementary input modality to direct manipulation for visual analytics can provide an engaging user experience. It enables users to focus on their tasks rather than worrying about operating the interface to visualization tools. In the past two decades, leveraging advanced natural language processing technologies, numerous V-NLI systems have been developed both within academic research and commercial software, especially in recent years. In this article, we conduct a comprehensive review of the existing V-NLIs. In order to classify each paper, we develop categorical dimensions based on a classic information visualization pipeline with the extension of a V-NLI layer. The following seven stages are used: query understanding, data transformation, visual mapping, view transformation, human interaction, context management, and presentation. Finally, we also shed light on several promising directions for future work in the community.
The effort for combating the COVID-19 pandemic around the world has resulted in a huge amount of data, e.g., from testing, contact tracing, modelling, treatment, vaccine trials, and more. In addition to numerous challenges in epidemiology, healthcare, biosciences, and social sciences, there has been an urgent need to develop and provide visualisation and visual analytics (VIS) capacities to support emergency responses under difficult operational conditions. In this paper, we report the experience of a group of VIS volunteers who have been working in a large research and development consortium and providing VIS support to various observational, analytical, model-developmental and disseminative tasks. In particular, we describe our approaches to the challenges that we have encountered in requirements analysis, data acquisition, visual design, software design, system development, team organisation, and resource planning. By reflecting on our experience, we propose a set of recommendations as the first step towards a methodology for developing and providing rapid VIS capacities to support emergency responses.
Inspired by the great success of machine learning (ML), researchers have applied ML techniques to visualizations to achieve a better design, development, and evaluation of visualizations. This branch of studies, known as ML4VIS, is gaining increasing research attention in recent years. To successfully adapt ML techniques for visualizations, a structured understanding of the integration of ML4VISis needed. In this paper, we systematically survey 88 ML4VIS studies, aiming to answer two motivating questions: what visualization processes can be assisted by ML? and how ML techniques can be used to solve visualization problems? This survey reveals seven main processes where the employment of ML techniques can benefit visualizations:Data Processing4VIS, Data-VIS Mapping, InsightCommunication, Style Imitation, VIS Interaction, VIS Reading, and User Profiling. The seven processes are related to existing visualization theoretical models in an ML4VIS pipeline, aiming to illuminate the role of ML-assisted visualization in general visualizations.Meanwhile, the seven processes are mapped into main learning tasks in ML to align the capabilities of ML with the needs in visualization. Current practices and future opportunities of ML4VIS are discussed in the context of the ML4VIS pipeline and the ML-VIS mapping. While more studies are still needed in the area of ML4VIS, we hope this paper can provide a stepping-stone for future exploration. A web-based interactive browser of this survey is available at https://ml4vis.github.io