No Arabic abstract
Many visual analytics systems allow users to interact with machine learning models towards the goals of data exploration and insight generation on a given dataset. However, in some situations, insights may be less important than the production of an accurate predictive model for future use. In that case, users are more interested in generating of diverse and robust predictive models, verifying their performance on holdout data, and selecting the most suitable model for their usage scenario. In this paper, we consider the concept of Exploratory Model Analysis (EMA), which is defined as the process of discovering and selecting relevant models that can be used to make predictions on a data source. We delineate the differences between EMA and the well-known term exploratory data analysis in terms of the desired outcome of the analytic process: insights into the data or a set of deployable models. The contributions of this work are a visual analytics system workflow for EMA, a user study, and two use cases validating the effectiveness of the workflow. We found that our system workflow enabled users to generate complex models, to assess them for various qualities, and to select the most relevant model for their task.
Many processes, from gene interaction in biology to computer networks to social media, can be modeled more precisely as temporal hypergraphs than by regular graphs. This is because hypergraphs generalize graphs by extending edges to connect any number of vertices, allowing complex relationships to be described more accurately and predict their behavior over time. However, the interactive exploration and seamless refinement of such hypergraph-based prediction models still pose a major challenge. We contribute Hyper-Matrix, a novel visual analytics technique that addresses this challenge through a tight coupling between machine-learning and interactive visualizations. In particular, the technique incorporates a geometric deep learning model as a blueprint for problem-specific models while integrating visualizations for graph-based and category-based data with a novel combination of interactions for an effective user-driven exploration of hypergraph models. To eliminate demanding context switches and ensure scalability, our matrix-based visualization provides drill-down capabilities across multiple levels of semantic zoom, from an overview of model predictions down to the content. We facilitate a focused analysis of relevant connections and groups based on interactive user-steering for filtering and search tasks, a dynamically modifiable partition hierarchy, various matrix reordering techniques, and interactive model feedback. We evaluate our technique in a case study and through formative evaluation with law enforcement experts using real-world internet forum communication data. The results show that our approach surpasses existing solutions in terms of scalability and applicability, enables the incorporation of domain knowledge, and allows for fast search-space traversal. With the technique, we pave the way for the visual analytics of temporal hypergraphs in a wide variety of domains.
Communication consists of both meta-information as well as content. Currently, the automated analysis of such data often focuses either on the network aspects via social network analysis or on the content, utilizing methods from text-mining. However, the first category of approaches does not leverage the rich content information, while the latter ignores the conversation environment and the temporal evolution, as evident in the meta-information. In contradiction to communication research, which stresses the importance of a holistic approach, both aspects are rarely applied simultaneously, and consequently, their combination has not yet received enough attention in automated analysis systems. In this work, we aim to address this challenge by discussing the difficulties and design decisions of such a path as well as contribute CommAID, a blueprint for a holistic strategy to communication analysis. It features an integrated visual analytics design to analyze communication networks through dynamics modeling, semantic pattern retrieval, and a user-adaptable and problem-specific machine learning-based retrieval system. An interactive multi-level matrix-based visualization facilitates a focused analysis of both network and content using inline visuals supporting cross-checks and reducing context switches. We evaluate our approach in both a case study and through formative evaluation with eight law enforcement experts using a real-world communication corpus. Results show that our solution surpasses existing techniques in terms of integration level and applicability. With this contribution, we aim to pave the path for a more holistic approach to communication analysis.
Visual analytics systems enable highly interactive exploratory data analysis. Across a range of fields, these technologies have been successfully employed to help users learn from complex data. However, these same exploratory visualization techniques make it easy for users to discover spurious findings. This paper proposes new methods to monitor a users analytic focus during visual analysis of structured datasets and use it to surface relevant articles that contextualize the visualized findings. Motivated by interactive analyses of electronic health data, this paper introduces a formal model of analytic focus, a computational approach to dynamically update the focus model at the time of user interaction, and a prototype application that leverages this model to surface relevant medical publications to users during visual analysis of a large corpus of medical records. Evaluation results with 24 users show that the modeling approach has high levels of accuracy and is able to surface highly relevant medical abstracts.
Graph Neural Networks (GNNs) aim to extend deep learning techniques to graph data and have achieved significant progress in graph analysis tasks (e.g., node classification) in recent years. However, similar to other deep neural networks like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), GNNs behave like a black box with their details hidden from model developers and users. It is therefore difficult to diagnose possible errors of GNNs. Despite many visual analytics studies being done on CNNs and RNNs, little research has addressed the challenges for GNNs. This paper fills the research gap with an interactive visual analysis tool, GNNVis, to assist model developers and users in understanding and analyzing GNNs. Specifically, Parallel Sets View and Projection View enable users to quickly identify and validate error patterns in the set of wrong predictions; Graph View and Feature Matrix View offer a detailed analysis of individual nodes to assist users in forming hypotheses about the error patterns. Since GNNs jointly model the graph structure and the node features, we reveal the relative influences of the two types of information by comparing the predictions of three models: GNN, Multi-Layer Perceptron (MLP), and GNN Without Using Features (GNNWUF). Two case studies and interviews with domain experts demonstrate the effectiveness of GNNVis in facilitating the understanding of GNN models and their errors.
Visual analytics for machine learning has recently evolved as one of the most exciting areas in the field of visualization. To better identify which research topics are promising and to learn how to apply relevant techniques in visual analytics, we systematically review 259 papers published in the last ten years together with representative works before 2010. We build a taxonomy, which includes three first-level categories: techniques before model building, techniques during model building, and techniques after model building. Each category is further characterized by representative analysis tasks, and each task is exemplified by a set of recent influential works. We also discuss and highlight research challenges and promising potential future research opportunities useful for visual analytics researchers.