ﻻ يوجد ملخص باللغة العربية
Optimally solving a multi-armed bandit problem suffers the curse of dimensionality. Indeed, resorting to dynamic programming leads to an exponential growth of computing time, as the number of arms and the horizon increase. We introduce a decompositioncoordination heuristic, DeCo, that turns the initial problem into parallelly coordinated one-armed bandit problems. As a consequence, we obtain a computing time which is essentially linear in the number of arms. In addition, the decomposition provides a theoretical lower bound on the regret. For the two-armed bandit case, dynamic programming provides the exact solution, which is almost matched by the DeCo heuristic. Moreover, in numerical simulations with up to 100 rounds and 20 arms, DeCo outperforms classic algorithms (Thompson sampling and Kullback-Leibler upper-confidence bound) and almost matches the theoretical lower bound on the regret for 20 arms.
Restless Multi-Armed Bandits (RMABs) have been popularly used to model limited resource allocation problems. Recently, these have been employed for health monitoring and intervention planning problems. However, the existing approaches fail to account
In this paper we study a class of risk-sensitive Markovian control problems in discrete time subject to model uncertainty. We consider a risk-sensitive discounted cost criterion with finite time horizon. The used methodology is the one of adaptive robust control combined with machine learning.
Optimal control problems with a very large time horizon can be tackled with the Receding Horizon Control (RHC) method, which consists in solving a sequence of optimal control problems with small prediction horizon. The main result of this article is
This paper is devoted to analysing the explicit slow decay rate and turnpike in the infinite-horizon linear quadratic optimal control problems for hyperbolic systems. Assume that some weak observability or controllability are satisfied, by which, the
We propose a semi-proximal augmented Lagrangian based decomposition method for convex composite quadratic conic programming problems with primal block angular structures. Using our algorithmic framework, we are able to naturally derive several well k