No Arabic abstract
While decision makers have begun to employ machine learning, machine learning models may make predictions that bias against certain demographic groups. Semi-automated bias detection tools often present reports of automatically-detected biases using a recommendation list or visual cues. However, there is a lack of guidance concerning which presentation style to use in what scenarios. We conducted a small lab study with 16 participants to investigate how presentation style might affect user behaviors in reviewing bias reports. Participants used both a prototype with a recommendation list and a prototype with visual cues for bias detection. We found that participants often wanted to investigate the performance measures that were not automatically detected as biases. Yet, when using the prototype with a recommendation list, they tended to give less consideration to such measures. Grounded in the findings, we propose information load and comprehensiveness as two axes for characterizing bias detection tasks and illustrate how the two axes could be adopted to reason about when to use a recommendation list or visual cues.
The use of automatic grading tools has become nearly ubiquitous in large undergraduate programming courses, and recent work has focused on improving the quality of automatically generated feedback. However, there is a relative lack of data directly comparing student outcomes when receiving computer-generated feedback and human-written feedback. This paper addresses this gap by splitting one 90-student class into two feedback groups and analyzing differences in the two cohorts performance. The class is an intro to AI with programming HW assignments. One group of students received detailed computer-generated feedback on their programming assignments describing which parts of the algorithms logic was missing; the other group additionally received human-written feedback describing how their programs syntax relates to issues with their logic, and qualitative (style) recommendations for improving their code. Results on quizzes and exam questions suggest that human feedback helps students obtain a better conceptual understanding, but analyses found no difference between the groups ability to collaborate on the final project. The course grade distribution revealed that students who received human-written feedback performed better overall; this effect was the most pronounced in the middle two quartiles of each group. These results suggest that feedback about the syntax-logic relation may be a primary mechanism by which human feedback improves student outcomes.
Data-driven decision-making consequential to individuals raises important questions of accountability and justice. Indeed, European law provides individuals limited rights to meaningful information about the logic behind significant, autonomous decisions such as loan approvals, insurance quotes, and CV filtering. We undertake three experimental studies examining peoples perceptions of justice in algorithmic decision-making under different scenarios and explanation styles. Dimensions of justice previously observed in response to human decision-making appear similarly engaged in response to algorithmic decisions. Qualitative analysis identified several concerns and heuristics involved in justice perceptions including arbitrariness, generalisation, and (in)dignity. Quantitative analysis indicates that explanation styles primarily matter to justice perceptions only when subjects are exposed to multiple different styles---under repeated exposure of one style, scenario effects obscure any explanation effects. Our results suggests there may be no best approach to explaining algorithmic decisions, and that reflection on their automated nature both implicates and mitigates justice dimensions.
Providing reinforcement learning agents with informationally rich human knowledge can dramatically improve various aspects of learning. Prior work has developed different kinds of shaping methods that enable agents to learn efficiently in complex environments. All these methods, however, tailor human guidance to agents in specialized shaping procedures, thus embodying various characteristics and advantages in different domains. In this paper, we investigate the interplay between different shaping methods for more robust learning performance. We propose an adaptive shaping algorithm which is capable of learning the most suitable shaping method in an on-line manner. Results in two classic domains verify its effectiveness from both simulated and real human studies, shedding some light on the role and impact of human factors in human-robot collaborative learning.
The global infrastructure of the Web, designed as an open and transparent system, has a significant impact on our society. However, algorithmic systems of corporate entities that neglect those principles increasingly populated the Web. Typical representatives of these algorithmic systems are recommender systems that influence our society both on a scale of global politics and during mundane shopping decisions. Recently, such recommender systems have come under critique for how they may strengthen existing or even generate new kinds of biases. To this end, designers and engineers are increasingly urged to make the functioning and purpose of recommender systems more transparent. Our research relates to the discourse of algorithm awareness, that reconsiders the role of algorithm visibility in interface design. We conducted online experiments with 105 participants using MTurk for the recommender system Recoin, a gadget for Wikidata. In these experiments, we presented users with one of a set of three different designs of Recoins user interface, each of them exhibiting a varying degree of explainability and interactivity. Our findings include a positive correlation between comprehension of and trust in an algorithmic system in our interactive redesign. However, our results are not conclusive yet, and suggest that the measures of comprehension, fairness, accuracy and trust are not yet exhaustive for the empirical study of algorithm awareness. Our qualitative insights provide a first indication for further measures. Our study participants, for example, were less concerned with the details of understanding an algorithmic calculation than with who or what is judging the result of the algorithm.
User beliefs about algorithmic systems are constantly co-produced through user interaction and the complex socio-technical systems that generate recommendations. Identifying these beliefs is crucial because they influence how users interact with recommendation algorithms. With no prior work on user beliefs of algorithmic video recommendations, practitioners lack relevant knowledge to improve the user experience of such systems. To address this problem, we conducted semi-structured interviews with middle-aged YouTube video consumers to analyze their user beliefs about the video recommendation system. Our analysis revealed different factors that users believe influence their recommendations. Based on these factors, we identified four groups of user beliefs: Previous Actions, Social Media, Recommender System, and Company Policy. Additionally, we propose a framework to distinguish the four main actors that users believe influence their video recommendations: the current user, other users, the algorithm, and the organization. This framework provides a new lens to explore design suggestions based on the agency of these four actors. It also exposes a novel aspect previously unexplored: the effect of corporate decisions on the interaction with algorithmic recommendations. While we found that users are aware of the existence of the recommendation system on YouTube, we show that their understanding of this system is limited.