Assessing the Impact of Automated Suggestions on Decision Making: Domain Experts Mediate Model Errors but Take Less Initiative

61 0 0.0 ( 0 )

Download Cite

Added by Monica Agrawal

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Ariel Levy - Monica Agrawal - Arvind Satyanarayan

Human-Computer Interaction

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Automated decision support can accelerate tedious tasks as users can focus their attention where it is needed most. However, a key concern is whether users overly trust or cede agency to automation. In this paper, we investigate the effects of introducing automation to annotating clinical texts--a multi-step, error-prone task of identifying clinical concepts (e.g., procedures) in medical notes, and mapping them to labels in a large ontology. We consider two forms of decision aid: recommending which labels to map concepts to, and pre-populating annotation suggestions. Through laboratory studies, we find that 18 clinicians generally build intuition of when to rely on automation and when to exercise their own judgement. However, when presented with fully pre-populated suggestions, these expert users exhibit less agency: accepting improper mentions, and taking less initiative in creating additional annotations. Our findings inform how systems and algorithms should be designed to mitigate the observed issues.

rate research

The Impact of Speed and Bias on the Cognitive Processes of Experts and Novices in Medical Image Decision-making

57 - Jennifer S. Trueblood , William R. Holmes , Adam C. Seegmiller 2017

Training individuals to make accurate decisions from medical images is a critical component of education in diagnostic pathology. We describe a joint experimental and computational modeling approach to examine the similarities and differences in the cognitive processes of novice participants and experienced participants (pathology residents and pathology faculty) in cancer cell image identification. For this study we collected a bank of hundreds of digital images that were identified by cell type and classified by difficulty by a panel of expert hematopathologists. The key manipulations in our study included examining the speed-accuracy tradeoff as well as the impact of prior expectations on decisions. In addition, our study examined individual differences in decision-making by comparing task performance to domain general visual ability (as measured using the Novel Object Memory Test (NOMT) (Richler et al., 2017). Using Signal Detection Theory (SDT) and the Diffusion Decision Model (DDM), we found many similarities between expert and novices in our task. While experts tended to have better discriminability, the two groups responded similarly to time pressure (i.e., reduced caution under speed instructions in the DDM) and to the introduction of a probabilistic cue (i.e., increased response bias in the DDM). These results have important implications for training in this area as well as using novice participants in research on medical image perception and decision-making.

Neurons and Cognition

Unremarkable AI: Fitting Intelligent Decision Support into Critical, Clinical Decision-Making Processes

95 - Qian Yang , Aaron Steinfeld , John Zimmerman 2019

Clinical decision support tools (DST) promise improved healthcare outcomes by offering data-driven insights. While effective in lab settings, almost all DSTs have failed in practice. Empirical research diagnosed poor contextual fit as the cause. This paper describes the design and field evaluation of a radically new form of DST. It automatically generates slides for clinicians decision meetings with subtly embedded machine prognostics. This design took inspiration from the notion of Unremarkable Computing, that by augmenting the users routines technology/AI can have significant importance for the users yet remain unobtrusive. Our field evaluation suggests clinicians are more likely to encounter and embrace such a DST. Drawing on their responses, we discuss the importance and intricacies of finding the right level of unremarkableness in DST design, and share lessons learned in prototyping critical AI systems as a situated experience.

Human-Computer Interaction Artificial Intelligence Computers and Society

Modelling Cyber-Security Experts Decision Making Processes using Aggregation Operators

155 - Simon Miller , Christian Wagner , Uwe Aickelin 2016

An important role carried out by cyber-security experts is the assessment of proposed computer systems, during their design stage. This task is fraught with difficulties and uncertainty, making the knowledge provided by human experts essential for successful assessment. Today, the increasing number of progressively complex systems has led to an urgent need to produce tools that support the expert-led process of system-security assessment. In this research, we use weighted averages (WAs) and ordered weighted averages (OWAs) with evolutionary algorithms (EAs) to create aggregation operators that model parts of the assessment process. We show how individual overall ratings for security components can be produced from ratings of their characteristics, and how these individual overall ratings can be aggregated to produce overall rankings of potential attacks on a system. As well as the identification of salient attacks and weak points in a prospective system, the proposed method also highlights which factors and security components contribute most to a components difficulty and attack ranking respectively. A real world scenario is used in which experts were asked to rank a set of technical attacks, and to answer a series of questions about the security components that are the subject of the attacks. The work shows how finding good aggregation operators, and identifying important components and factors of a cyber-security problem can be automated. The resulting operators have the potential for use as decision aids for systems designers and cyber-security experts, increasing the amount of assessment that can be achieved with the limited resources available.

Artificial Intelligence Cryptography and Security

Quantifying the Impact of Making and Breaking Interface Habits

47 - Diego Garaialde , Christopher P. Bowers , Charlie Pinder 2020

The frequency with which people interact with technology means that users may develop interface habits, i.e. fast, automatic responses to stable interface cues. Design guidelines often assume that interface habits are beneficial. However, we lack quantitative evidence of how the development of habits actually affect user performance and an understanding of how changes in the interface design may affect habit development. Our work quantifies the effect of habit formation and disruption on user performance in interaction. Through a forced choice lab study task (n=19) and in the wild deployment (n=18) of a notificationdialog experiment on smartphones, we show that people become more accurate and faster at option selection as they develop an interface habit. Crucially this performance gain is entirely eliminated once the habit is disrupted. We discuss reasons for this performance shift and analyse some disadvantages of interface habits, outlining general design patterns on how to both support and disrupt them.Keywords: Interface habits, user behaviour, breaking habit, interaction science, quantitative research.

Human-Computer Interaction

The Impact of Multiple Parallel Phrase Suggestions on Email Input and Composition Behaviour of Native and Non-Native English Writers

224 - Daniel Buschek , Martin Zurn , Malin Eiband 2021

We present an in-depth analysis of the impact of multi-word suggestion choices from a neural language model on user behaviour regarding input and text composition in email writing. Our study for the first time compares different numbers of parallel suggestions, and use by native and non-native English writers, to explore a trade-off of efficiency vs ideation, emerging from recent literature. We built a text editor prototype with a neural language model (GPT-2), refined in a prestudy with 30 people. In an online study (N=156), people composed emails in four conditions (0/1/3/6 parallel suggestions). Our results reveal (1) benefits for ideation, and costs for efficiency, when suggesting multiple phrases; (2) that non-native speakers benefit more from more suggestions; and (3) further insights into behaviour patterns. We discuss implications for research, the design of interactive suggestion systems, and the vision of supporting writers with AI instead of replacing them.

Human-Computer Interaction Computation and Language