Evaluating the Robustness of Collaborative Agents

58 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Paul Knott PhD MPhys BSc

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Paul Knott - Micah Carroll - Sam Devlin

التعلم الآلي الذكاء الاصطناعي تفاعل الإنسان والحاسوب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In order for agents trained by deep reinforcement learning to work alongside humans in realistic settings, we will need to ensure that the agents are emph{robust}. Since the real world is very diverse, and human behavior often changes in response to agent deployment, the agent will likely encounter novel situations that have never been seen during training. This results in an evaluation challenge: if we cannot rely on the average training or validation reward as a metric, then how can we effectively evaluate robustness? We take inspiration from the practice of emph{unit testing} in software engineering. Specifically, we suggest that when designing AI agents that collaborate with humans, designers should search for potential edge cases in emph{possible partner behavior} and emph{possible states encountered}, and write tests which check that the behavior of the agent in these edge cases is reasonable. We apply this methodology to build a suite of unit tests for the Overcooked-AI environment, and use this test suite to evaluate three proposals for improving robustness. We find that the test suite provides significant insight into the effects of these proposals that were generally not revealed by looking solely at the average validation reward.

قيم البحث

اقرأ أيضاً

Evaluating Agents without Rewards

109 - Brendon Matusch , Jimmy Ba , Danijar Hafner 2020

Reinforcement learning has enabled agents to solve challenging tasks in unknown environments. However, manually crafting reward functions can be time consuming, expensive, and error prone to human error. Competing objectives have been proposed for ag ents to learn without external supervision, but it has been unclear how well they reflect task rewards or human behavior. To accelerate the development of intrinsic objectives, we retrospectively compute potential objectives on pre-collected datasets of agent behavior, rather than optimizing them online, and compare them by analyzing their correlations. We study input entropy, information gain, and empowerment across seven agents, three Atari games, and the 3D game Minecraft. We find that all three intrinsic objectives correlate more strongly with a human behavior similarity metric than with task reward. Moreover, input entropy and information gain correlate more strongly with human similarity than task reward does, suggesting the use of intrinsic objectives for designing agents that behave similarly to human players.

التعلم الآلي الذكاء الاصطناعي علم الروبوتات

Evaluating the Interpretability of Generative Models by Interactive Reconstruction

114 - Andrew Slavin Ross , Nina Chen , Elisa Zhao Hang 2021

For machine learning models to be most useful in numerous sociotechnical systems, many have argued that they must be human-interpretable. However, despite increasing interest in interpretability, there remains no firm consensus on how to measure it. This is especially true in representation learning, where interpretability research has focused on disentanglement measures only applicable to synthetic datasets and not grounded in human factors. We introduce a task to quantify the human-interpretability of generative model representations, where users interactively modify representations to reconstruct target instances. On synthetic datasets, we find performance on this task much more reliably differentiates entangled and disentangled models than baseline approaches. On a real dataset, we find it differentiates between representation learning methods widely believed but never shown to produce more or less interpretable models. In both cases, we ran small-scale think-aloud studies and large-scale experiments on Amazon Mechanical Turk to confirm that our qualitative and quantitative results agreed.

التعلم الآلي الذكاء الاصطناعي تفاعل الإنسان والحاسوب

Improving Robustness of Deep Reinforcement Learning Agents: Environment Attacks based on Critic Networks

131 - Lucas Schott , Manon Cesaire , Hatem Hajri 2021

To improve policy robustness of deep reinforcement learning agents, a line of recent works focus on producing disturbances of the environment. Existing approaches of the literature to generate meaningful disturbances of the environment are adversaria l reinforcement learning methods. These methods set the problem as a two-player game between the protagonist agent, which learns to perform a task in an environment, and the adversary agent, which learns to disturb the protagonist via modifications of the considered environment. Both protagonist and adversary are trained with deep reinforcement learning algorithms. Alternatively, we propose in this paper to build on gradient-based adversarial attacks, usually used for classification tasks for instance, that we apply on the critic network of the protagonist to identify efficient disturbances of the environment. Rather than learning an attacker policy, which usually reveals as very complex and unstable, we leverage the knowledge of the critic network of the protagonist, to dynamically complexify the task at each step of the learning process. We show that our method, while being faster and lighter, leads to significantly better improvements in policy robustness than existing methods of the literature.

التعلم الآلي الذكاء الاصطناعي

Interpreting and Evaluating Neural Network Robustness

90 - Fuxun Yu , Zhuwei Qin , Chenchen Liu 2019

Recently, adversarial deception becomes one of the most considerable threats to deep neural networks. However, compared to extensive research in new designs of various adversarial attacks and defenses, the neural networks intrinsic robustness propert y is still lack of thorough investigation. This work aims to qualitatively interpret the adversarial attack and defense mechanism through loss visualization, and establish a quantitative metric to evaluate the neural network models intrinsic robustness. The proposed robustness metric identifies the upper bound of a models prediction divergence in the given domain and thus indicates whether the model can maintain a stable prediction. With extensive experiments, our metric demonstrates several advantages over conventional adversarial testing accuracy based robustness estimation: (1) it provides a uniformed evaluation to models with different structures and parameter scales; (2) it over-performs conventional accuracy based robustness estimation and provides a more reliable evaluation that is invariant to different test settings; (3) it can be fast generated without considerable testing cost.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Evaluating the Robustness of Geometry-Aware Instance-Reweighted Adversarial Training

146 - Dorjan Hitaj , Giulio Pagnotta , Iacopo Masi 2021

In this technical report, we evaluate the adversarial robustness of a very recent method called Geometry-aware Instance-reweighted Adversarial Training[7]. GAIRAT reports state-of-the-art results on defenses to adversarial attacks on the CIFAR-10 dat aset. In fact, we find that a network trained with this method, while showing an improvement over regular adversarial training (AT), is biasing the model towards certain samples by re-scaling the loss. Indeed, this leads the model to be susceptible to attacks that scale the logits. The original model shows an accuracy of 59% under AutoAttack - when trained with additional data with pseudo-labels. We provide an analysis that shows the opposite. In particular, we craft a PGD attack multiplying the logits by a positive scalar that decreases the GAIRAT accuracy from from 55% to 44%, when trained solely on CIFAR-10. In this report, we rigorously evaluate the model and provide insights into the reasons behind the vulnerability of GAIRAT to this adversarial attack. The code to reproduce our evaluation is made available at https://github.com/giuxhub/GAIRAT-LSA

التعلم الآلي التشفير والأمن