Multisource AI Scorecard Table for System Evaluation

95 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Erik Blasch

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Erik Blasch - James Sung - Tao Nguyen

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The paper describes a Multisource AI Scorecard Table (MAST) that provides the developer and user of an artificial intelligence (AI)/machine learning (ML) system with a standard checklist focused on the principles of good analysis adopted by the intelligence community (IC) to help promote the development of more understandable systems and engender trust in AI outputs. Such a scorecard enables a transparent, consistent, and meaningful understanding of AI tools applied for commercial and government use. A standard is built on compliance and agreement through policy, which requires buy-in from the stakeholders. While consistency for testing might only exist across a standard data set, the community requires discussion on verification and validation approaches which can lead to interpretability, explainability, and proper use. The paper explores how the analytic tradecraft standards outlined in Intelligence Community Directive (ICD) 203 can provide a framework for assessing the performance of an AI system supporting various operational needs. These include sourcing, uncertainty, consistency, accuracy, and visualization. Three use cases are presented as notional examples that support security for comparative analysis.

قيم البحث

78 - Ho Chit Siu , Jaime D. Pena , Kimberlee C. Chang 2021

Deep reinforcement learning has generated superhuman AI in competitive games such as Go and StarCraft. Can similar learning techniques create a superior AI teammate for human-machine collaborative games? Will humans prefer AI teammates that improve o bjective team performance or those that improve subjective metrics of trust? In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game Hanabi, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the humans perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate. We find that humans have a clear preference toward a rule-based AI teammate (SmartBot) over a state-of-the-art learning-based AI teammate (Other-Play) across nearly all subjective metrics, and generally view the learning-based agent negatively, despite no statistical difference in the game score. This result has implications for future AI design and reinforcement learning benchmarking, highlighting the need to incorporate subjective metrics of human-AI teaming rather than a singular focus on objective task performance.

الذكاء الاصطناعي تفاعل الإنسان والحاسوب

Explanation Ontology: A Model of Explanations for User-Centered AI

187 - Shruthi Chari , Oshani Seneviratne , Daniel M. Gruen 2020

Explainability has been a goal for Artificial Intelligence (AI) systems since their conception, with the need for explainability growing as more complex AI models are increasingly used in critical, high-stakes settings such as healthcare. Explanation s have often added to an AI system in a non-principled, post-hoc manner. With greater adoption of these systems and emphasis on user-centric explainability, there is a need for a structured representation that treats explainability as a primary consideration, mapping end user needs to specific explanation types and the systems AI capabilities. We design an explanation ontology to model both the role of explanations, accounting for the system and user attributes in the process, and the range of different literature-derived explanation types. We indicate how the ontology can support user requirements for explanations in the domain of healthcare. We evaluate our ontology with a set of competency questions geared towards a system designer who might use our ontology to decide which explanation types to include, given a combination of users needs and a systems capabilities, both in system design settings and in real-time operations. Through the use of this ontology, system designers will be able to make informed choices on which explanations AI systems can and should provide.

الذكاء الاصطناعي تفاعل الإنسان والحاسوب التعلم الآلي

LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing

248 - Yu Li , Josh Arnold , Feifan Yan 2021

We present LEGOEval, an open-source toolkit that enables researchers to easily evaluate dialogue systems in a few lines of code using the online crowdsource platform, Amazon Mechanical Turk. Compared to existing toolkits, LEGOEval features a flexible task design by providing a Python API that maps to commonly used React.js interface components. Researchers can personalize their evaluation procedures easily with our built-in pages as if playing with LEGO blocks. Thus, LEGOEval provides a fast, consistent method for reproducing human evaluation results. Besides the flexible task design, LEGOEval also offers an easy API to review collected data.

الذكاء الاصطناعي تفاعل الإنسان والحاسوب

Document-editing Assistants and Model-based Reinforcement Learning as a Path to Conversational AI

90 - Katya Kudashkina , Patrick M. Pilarski , Richard S. Sutton 2020

Intelligent assistants that follow commands or answer simple questions, such as Siri and Google search, are among the most economically important applications of AI. Future conversational AI assistants promise even greater capabilities and a better u ser experience through a deeper understanding of the domain, the user, or the users purposes. But what domain and what methods are best suited to researching and realizing this promise? In this article we argue for the domain of voice document editing and for the methods of model-based reinforcement learning. The primary advantages of voice document editing are that the domain is tightly scoped and that it provides something for the conversation to be about (the document) that is delimited and fully accessible to the intelligent assistant. The advantages of reinforcement learning in general are that its methods are designed to learn from interaction without explicit instruction and that it formalizes the purposes of the assistant. Model-based reinforcement learning is needed in order to genuinely understand the domain of discourse and thereby work efficiently with the user to achieve their goals. Together, voice document editing and model-based reinforcement learning comprise a promising research direction for achieving conversational AI.

الذكاء الاصطناعي تفاعل الإنسان والحاسوب التعلم الآلي

Player-Centered AI for Automatic Game Personalization: Open Problems

341 - Jichen Zhu , Santiago Onta~non 2021

Computer games represent an ideal research domain for the next generation of personalized digital applications. This paper presents a player-centered framework of AI for game personalization, complementary to the commonly used system-centered approac hes. Built on the Structure of Actions theory, the paper maps out the current landscape of game personalization research and identifies eight open problems that need further investigation. These problems require deep collaboration between technological advancement and player experience design.

الذكاء الاصطناعي تفاعل الإنسان والحاسوب