Reinforcement Learning with Supervision from Noisy Demonstrations

89 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sheng-Jun Huang

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Kun-Peng Ning - Sheng-Jun Huang

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Reinforcement learning has achieved great success in various applications. To learn an effective policy for the agent, it usually requires a huge amount of data by interacting with the environment, which could be computational costly and time consuming. To overcome this challenge, the framework called Reinforcement Learning with Expert Demonstrations (RLED) was proposed to exploit the supervision from expert demonstrations. Although the RLED methods can reduce the number of learning iterations, they usually assume the demonstrations are perfect, and thus may be seriously misled by the noisy demonstrations in real applications. In this paper, we propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations. Specifically, for each step of the demonstration trajectory, we form an instance, and define a joint loss function to simultaneously maximize the expected reward and minimize the difference between agent behaviors and demonstrations. Most importantly, by calculating the expected gain of the value function, we assign each instance with a weight to estimate its potential utility, and thus can emphasize the more helpful demonstrations while filter out noisy ones. Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations, and achieve higher performance in fewer iterations.

قيم البحث

89 - Christian Scheller , Yanick Schraner , Manfred Vogel 2020

Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond wi th only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning.

التعلم الآلي التعلم الالي

Residual Reinforcement Learning from Demonstrations

365 - Minttu Alakuijala , Gabriel Dulac-Arnold 2021

Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visua l inputs and sparse rewards using demonstrations. Learning from images, proprioceptive inputs and a sparse task-completion reward relaxes the requirement of accessing full state features, such as object and target positions. In addition, replacing the base controller with a policy learned from demonstrations removes the dependency on a hand-engineered controller in favour of a dataset of demonstrations, which can be provided by non-experts. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning, and is capable of solving high-dimensional, sparse-reward tasks out of reach for RL from scratch.

التعلم الآلي

Co-learning: Learning from Noisy Labels with Self-supervision

71 - Cheng Tan , Jun Xia , Lirong Wu 2021

Noisy labels, resulting from mistakes in manual labeling or webly data collecting for supervised learning, can cause neural networks to overfit the misleading information and degrade the generalization performance. Self-supervised learning works in t he absence of labels and thus eliminates the negative impact of noisy labels. Motivated by co-training with both supervised learning view and self-supervised learning view, we propose a simple yet effective method called Co-learning for learning with noisy labels. Co-learning performs supervised learning and self-supervised learning in a cooperative way. The constraints of intrinsic similarity with the self-supervised module and the structural similarity with the noisily-supervised module are imposed on a shared common feature encoder to regularize the network to maximize the agreement between the two constraints. Co-learning is compared with peer methods on corrupted data from benchmark datasets fairly, and extensive results are provided which demonstrate that Co-learning is superior to many state-of-the-art approaches.

التعلم الآلي

Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations

445 - Ozsel Kilinc , Giovanni Montana 2018

Multi-agent reinforcement learning systems aim to provide interacting agents with the ability to collaboratively learn and adapt to the behaviour of other agents. In many real-world applications, the agents can only acquire a partial view of the worl d. Here we consider a setting whereby most agents observations are also extremely noisy, hence only weakly correlated to the true state of the environment. Under these circumstances, learning an optimal policy becomes particularly challenging, even in the unrealistic case that an agents policy can be made conditional upon all other agents observations. To overcome these difficulties, we propose a multi-agent deep deterministic policy gradient algorithm enhanced by a communication medium (MADDPG-M), which implements a two-level, concurrent learning mechanism. An agents policy depends on its own private observations as well as those explicitly shared by others through a communication medium. At any given point in time, an agent must decide whether its private observations are sufficiently informative to be shared with others. However, our environments provide no explicit feedback informing an agent whether a communication action is beneficial, rather the communication policies must also be learned through experience concurrently to the main policies. Our experimental results demonstrate that the algorithm performs well in six highly non-stationary environments of progressively higher complexity, and offers substantial performance gains compared to the baselines.

التعلم الآلي التعلم الالي

Cooperative Learning for Noisy Supervision

73 - Hao Wu , Jiangchao Yao , Ya Zhang 2021

Learning with noisy labels has gained the enormous interest in the robust deep learning area. Recent studies have empirically disclosed that utilizing dual networks can enhance the performance of single network but without theoretic proof. In this pa per, we propose Cooperative Learning (CooL) framework for noisy supervision that analytically explains the effects of leveraging dual or multiple networks. Specifically, the simple but efficient combination in CooL yields a more reliable risk minimization for unseen clean data. A range of experiments have been conducted on several benchmarks with both synthetic and real-world settings. Extensive results indicate that CooL outperforms several state-of-the-art methods.

التعلم الآلي