Goal directed molecule generation using Monte Carlo Tree Search

116 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Anand A. Rajasekar

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Anand A. Rajasekar - Karthik Raman - Balaraman Ravindran

التعلم الآلي الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

One challenging and essential task in biochemistry is the generation of novel molecules with desired properties. Novel molecule generation remains a challenge since the molecule space is difficult to navigate through, and the generated molecules should obey the rules of chemical valency. Through this work, we propose a novel method, which we call unitMCTS, to perform molecule generation by making a unit change to the molecule at every step using Monte Carlo Tree Search. We show that this method outperforms the recently published techniques on benchmark molecular optimization tasks such as QED and penalized logP. We also demonstrate the usefulness of this method in improving molecule properties while being similar to the starting molecule. Given that there is no learning involved, our method finds desired molecules within a shorter amount of time.

قيم البحث

179 - Giambattista Parascandolo , Lars Buesing , Josh Merel 2020

Standard planners for sequential decision making (including Monte Carlo planning, tree search, dynamic programming, etc.) are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed. We consider alternatives to this assumption for the class of goal-directed Reinforcement Learning (RL) problems. Instead of an environment transition model, we assume an imperfect, goal-directed policy. This low-level policy can be improved by a plan, consisting of an appropriate sequence of sub-goals that guide it from the start to the goal state. We propose a planning algorithm, Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS), for approximating the optimal plan by means of proposing intermediate sub-goals which hierarchically partition the initial tasks into simpler ones that are then solved independently and recursively. The algorithm critically makes use of a learned sub-goal proposal for finding appropriate partitions trees of new tasks based on prior experience. Different strategies for learning sub-goal proposals give rise to different planning strategies that strictly generalize sequential planning. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds as well as in challenging continuous control environments.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

On Effective Parallelization of Monte Carlo Tree Search

294 - Anji Liu , Yitao Liang , Ji Liu 2020

Despite its groundbreaking success in Go and computer games, Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree, which calls for effective parallelization. However, how to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood. In this paper, we seek to lay its first theoretical foundation, by examining the potential performance loss caused by parallelization when achieving a desired speedup. In particular, we discover the necessary conditions of achieving a desirable parallelization performance, and highlight two of their practical benefits. First, by examining whether existing parallel MCTS algorithms satisfy these conditions, we identify key design principles that should be inherited by future algorithms, for example tracking the unobserved samples (used in WU-UCT (Liu et al., 2020)). We theoretically establish this essential design facilitates $mathcal{O} ( ln n + M / sqrt{ln n} )$ cumulative regret when the maximum tree depth is 2, where $n$ is the number of rollouts and $M$ is the number of workers. A regret of this form is highly desirable, as compared to $mathcal{O} ( ln n )$ regret incurred by a sequential counterpart, its excess part approaches zero as $n$ increases. Second, and more importantly, we demonstrate how the proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms. To illustrate this, we propose a new parallel MCTS algorithm, called BU-UCT, by following our theoretical guidelines. The newly proposed algorithm, albeit preliminary, out-performs four competitive baselines on 11 out of 15 Atari games. We hope our theoretical results could inspire future work of more effective parallel MCTS.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Parallelization of Monte Carlo Tree Search in Continuous Domains

198 - Karl Kurzer , Christoph Hortnagl , J. Marius Zollner 2020

Monte Carlo Tree Search (MCTS) has proven to be capable of solving challenging tasks in domains such as Go, chess and Atari. Previous research has developed parall

التعلم الآلي الذكاء الاصطناعي التعلم الالي

StarCraft II Build Order Optimization using Deep Reinforcement Learning and Monte-Carlo Tree Search

138 - Islam Elnabarawy , Kristijana Arroyo , Donald C. Wunsch II 2020

The real-time strategy game of StarCraft II has been posed as a challenge for reinforcement learning by Googles DeepMind. This study examines the use of an agent based on the Monte-Carlo Tree Search algorithm for optimizing the build order in StarCra ft II, and discusses how its performance can be improved even further by combining it with a deep reinforcement learning neural network. The experimental results accomplished using Monte-Carlo Tree Search achieves a score similar to a novice human player by only using very limited time and computational resources, which paves the way to achieving scores comparable to those of a human expert by combining it with the use of deep reinforcement learning.

التعلم الآلي الذكاء الاصطناعي

Active Reinforcement Learning with Monte-Carlo Tree Search

332 - Sebastian Schulze , Owain Evans 2018

Active Reinforcement Learning (ARL) is a twist on RL where the agent observes reward information only if it pays a cost. This subtle change makes exploration substantially more challenging. Powerful principles in RL like optimism, Thompson sampling, and random exploration do not help with ARL. We relate ARL in tabular environments to Bayes-Adaptive MDPs. We provide an ARL algorithm using Monte-Carlo Tree Search that is asymptotically Bayes optimal. Experimentally, this algorithm is near-optimal on small Bandit problems and MDPs. On larger MDPs it outperforms a Q-learner augmented with specialised heuristics for ARL. By analysing exploration behaviour in detail, we uncover obstacles to scaling up simulation-based algorithms for ARL.

التعلم الآلي التعلم الالي