Limited depth bandit-based strategy for Monte Carlo planning in continuous action spaces

نشر في Ricardo Quinteiro بتاريخ 2021 والبحث باللغة English تحميل البحث

الملخص بالإنكليزية

This paper addresses the problem of optimal control using search trees. We start by considering multi-armed bandit problems with continuous action spaces and propose LD-HOO, a limited depth variant of the hierarchical optimistic optimization (HOO) algorithm. We provide a regret analysis for LD-HOO and show that, asymptotically, our algorithm exhibits the same cumulative regret as the original HOO while being faster and more memory efficient. We then propose a Monte Carlo tree search algorithm based on LD-HOO for optimal control problems and illustrate the resulting approachs application in several optimal control problems.

تحميل البحث