Langevin DQN

80 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Vikranth Dwaracherla

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Vikranth Dwaracherla - Benjamin Van Roy

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Algorithms that tackle deep exploration -- an important challenge in reinforcement learning -- have relied on epistemic uncertainty representation through ensembles or other hypermodels, exploration bonuses, or visitation count distributions. An open question is whether deep exploration can be achieved by an incremental reinforcement learning algorithm that tracks a single point estimate, without additional complexity required to account for epistemic uncertainty. We answer this question in the affirmative. In particular, we develop Langevin DQN, a variation of DQN that differs only in perturbing parameter updates with Gaussian noise and demonstrate through a computational study that the presented algorithm achieves deep exploration. We also offer some intuition to how Langevin DQN achieves deep exploration. In addition, we present a modification of the Langevin DQN algorithm to improve the computational efficiency.

قيم البحث

131 - Jesse Farebrother , Marlos C. Machado , Michael Bowling 2018

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Mixture of Step Returns in Bootstrapped DQN

94 - Po-Han Chiang , Hsuan-Kung Yang , Zhang-Wei Hong 2020

The concept of utilizing multi-step returns for updating value functions has been adopted in deep reinforcement learning (DRL) for a number of years. Updating value functions with different backup lengths provides advantages in different aspects, inc luding bias and variance of value estimates, convergence speed, and exploration behavior of the agent. Conventional methods such as TD-lambda leverage these advantages by using a target value equivalent to an exponential average of different step returns. Nevertheless, integrating step returns into a single target sacrifices the diversity of the advantages offered by different step return targets. To address this issue, we propose Mixture Bootstrapped DQN (MB-DQN) built on top of bootstrapped DQN, and uses different backup lengths for different bootstrapped heads. MB-DQN enables heterogeneity of the target values that is unavailable in approaches relying only on a single target value. As a result, it is able to maintain the advantages offered by different backup lengths. In this paper, we first discuss the motivational insights through a simple maze environment. In order to validate the effectiveness of MB-DQN, we perform experiments on the Atari 2600 benchmark environments, and demonstrate the performance improvement of MB-DQN over a number of baseline methods. We further provide a set of ablation studies to examine the impacts of different design configurations of MB-DQN.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Deep Q-Learning for Nash Equilibria: Nash-DQN

121 - Philippe Casgrain , Brian Ning , Sebastian Jaimungal 2019

Model-free learning for multi-agent stochastic games is an active area of research. Existing reinforcement learning algorithms, however, are often restricted to zero-sum games, and are applicable only in small state-action spaces or other simplified settings. Here, we develop a new data efficient Deep-Q-learning methodology for model-free learning of Nash equilibria for general-sum stochastic games. The algorithm uses a local linear-quadratic expansion of the stochastic game, which leads to analytically solvable optimal actions. The expansion is parametrized by deep neural networks to give it sufficient flexibility to learn the environment without the need to experience all state-action pairs. We study symmetry properties of the algorithm stemming from label-invariant stochastic games and as a proof of concept, apply our algorithm to learning optimal trading strategies in competitive electronic markets.

التعلم الآلي علوم الكمبيوتر ونظرية الألعاب المالية الحاسوبية

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

397 - K.E. Avrachenkov , V.S. Borkar , H.P. Dolhare 2021

We analyze the DQN reinforcement learning algorithm as a stochastic approximation scheme using the o.d.e. (for ordinary differential equation) approach and point out certain theoretical issues. We then propose a modified scheme called Full Gradient D QN (FG-DQN, for short) that has a sound theoretical basis and compare it with the original scheme on sample problems. We observe a better performance for FG-DQN.

التعلم الآلي التحسين والتحكم الاحتمالات

DQN Control Solution for KDD Cup 2021 City Brain Challenge

130 - Yitian Chen , Kunlong Chen , Kunjin Chen 2021

We took part in the city brain challenge competition and achieved the 8th place. In this competition, the players are provided with a real-world city-scale road network and its traffic demand derived from real traffic data. The players are asked to c oordinate the traffic signals with a self-designed agent to maximize the number of vehicles served while maintaining an acceptable delay. In this abstract paper, we present an overall analysis and our detailed solution to this competition. Our approach is mainly based on the adaptation of the deep Q-network (DQN) for real-time traffic signal control. From our perspective, the major challenge of this competition is how to extend the classical DQN framework to traffic signals control in real-world complex road network and traffic flow situation. After trying and implementing several classical reward functions, we finally chose to apply our newly-designed reward in our agent. By applying our newly-proposed reward function and carefully tuning the control scheme, an agent based on a single DQN model can rank among the top 15 teams. We hope this paper could serve, to some extent, as a baseline solution to traffic signal control of real-world road network and inspire further attempts and researches.

التعلم الآلي