ترغب بنشر مسار تعليمي؟ اضغط هنا

Reinforcement learning (RL) research focuses on general solutions that can be applied across different domains. This results in methods that RL practitioners can use in almost any domain. However, recent studies often lack the engineering steps (tric ks) which may be needed to effectively use RL, such as reward shaping, curriculum learning, and splitting a large task into smaller chunks. Such tricks are common, if not necessary, to achieve state-of-the-art results and win RL competitions. To ease the engineering efforts, we distill descriptions of tricks from state-of-the-art results and study how well these tricks can improve a standard deep Q-learning agent. The long-term goal of this work is to enable combining proven RL methods with domain-specific tricks by providing a unified software framework and accompanying insights in multiple domains.
Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond wi th only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا