Decomposition-Coordination Method for Finite Horizon Bandit Problems

78 0 0.0 ( 0 )

Download Cite

Added by Michel de Lara

Publication date 2021

fields

and research's language is English

Authors Michel de Lara - Benjamin Heymann - Jean-Philippe Chancelier

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Optimally solving a multi-armed bandit problem suffers the curse of dimensionality. Indeed, resorting to dynamic programming leads to an exponential growth of computing time, as the number of arms and the horizon increase. We introduce a decompositioncoordination heuristic, DeCo, that turns the initial problem into parallelly coordinated one-armed bandit problems. As a consequence, we obtain a computing time which is essentially linear in the number of arms. In addition, the decomposition provides a theoretical lower bound on the regret. For the two-armed bandit case, dynamic programming provides the exact solution, which is almost matched by the DeCo heuristic. Moreover, in numerical simulations with up to 100 rounds and 20 arms, DeCo outperforms classic algorithms (Thompson sampling and Kullback-Leibler upper-confidence bound) and almost matches the theoretical lower bound on the regret for 20 arms.

rate research

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

114 - Aditya Mate , Arpita Biswas , Christoph Siebenbrunner 2021

Restless Multi-Armed Bandits (RMABs) have been popularly used to model limited resource allocation problems. Recently, these have been employed for health monitoring and intervention planning problems. However, the existing approaches fail to account for the arrival of new patients and the departure of enrolled patients from a treatment program. To address this challenge, we formulate a streaming bandit (S-RMAB) framework, a generalization of RMABs where heterogeneous arms arrive and leave under possibly random streams. We propose a new and scalable approach to computing index-based solutions. We start by proving that index values decrease for short residual lifetimes, a phenomenon that we call index decay. We then provide algorithms designed to capture index decay without having to solve the costly finite horizon problem, thereby lowering the computational complexity compared to existing methods.We evaluate our approach via simulations run on real-world data obtained from a tuberculosis intervention planning task as well as multiple other synthetic domains. Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance. These findings are robust across multiple domains.

Machine Learning Artificial Intelligence

Risk-sensitive Markov decision problems under model uncertainty: finite time horizon case

108 - Tomasz R. Bielecki , Tao Chen , Igor Cialenco 2021

In this paper we study a class of risk-sensitive Markovian control problems in discrete time subject to model uncertainty. We consider a risk-sensitive discounted cost criterion with finite time horizon. The used methodology is the one of adaptive robust control combined with machine learning.

Optimization and Control

On the Turnpike Property and the Receding-Horizon Method for Linear-Quadratic Optimal Control Problems

217 - Tobias Breiten , Laurent Pfeiffer 2018

Optimal control problems with a very large time horizon can be tackled with the Receding Horizon Control (RHC) method, which consists in solving a sequence of optimal control problems with small prediction horizon. The main result of this article is the proof of the exponential convergence (with respect to the prediction horizon) of the control generated by the RHC method towards the exact solution of the problem. The result is established for a class of infinite-dimensional linear-quadratic optimal control problems with time-independent dynamics and integral cost. Such problems satisfy the turnpike property: the optimal trajectory remains most of the time very close to the solution to the associated static optimization problem. Specific terminal cost functions, derived from the Lagrange multiplier associated with the static optimization problem, are employed in the implementation of the RHC method.

Optimization and Control

Slow decay and turnpike for infinite-horizon hyperbolic LQ problems

292 - Zhong-Jie Han , Enrique Zuazua 2021

This paper is devoted to analysing the explicit slow decay rate and turnpike in the infinite-horizon linear quadratic optimal control problems for hyperbolic systems. Assume that some weak observability or controllability are satisfied, by which, the lower and upper bounds of the corresponding algebraic Riccati operator are estimated, respectively. Then based on these two bounds, the explicit slow decay rate of the closed-loop system with Riccati-based optimal feedback control is obtained. The averaged turnpike property for this problem is also further discussed. We then apply these results to the LQ optimal control problems constraint to networks of one-dimensional wave equations and also some multi-dimensional ones with local controls which lack of GCC(Geometric Control Condition).

Optimization and Control

A semi-proximal augmented Lagrangian based decomposition method for primal block angular convex composite quadratic conic programming problems

107 - Xin-Yee Lam , Defeng Sun , Kim-Chuan Toh 2018

We propose a semi-proximal augmented Lagrangian based decomposition method for convex composite quadratic conic programming problems with primal block angular structures. Using our algorithmic framework, we are able to naturally derive several well known augmented Lagrangian based decomposition methods for stochastic programming such as the diagonal quadratic approximation method of Mulvey and Ruszczy{n}ski. Moreover, we are able to derive novel enhancements and generalizations of these well known methods. We also propose a semi-proximal symmetric Gauss-Seidel based alternating direction method of multipliers for solving the corresponding dual problem. Numerical results show that our algorithms can perform well even for very large instances of primal block angular convex QP problems. For example, one instance with more than $300,000$ linear constraints and $12,500,000$ nonnegative variables is solved in less than a minute whereas Gurobi took more than 3 hours, and another instance {tt qp-gridgen1} with more than $331,000$ linear constraints and $986,000$ nonnegative variables is solved in about 5 minutes whereas Gurobi took more than 35 minutes.

Optimization and Control