ترغب بنشر مسار تعليمي؟ اضغط هنا

Decomposition-Coordination Method for Finite Horizon Bandit Problems

78   0   0.0 ( 0 )
 نشر من قبل Michel de Lara
 تاريخ النشر 2021
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

Optimally solving a multi-armed bandit problem suffers the curse of dimensionality. Indeed, resorting to dynamic programming leads to an exponential growth of computing time, as the number of arms and the horizon increase. We introduce a decompositioncoordination heuristic, DeCo, that turns the initial problem into parallelly coordinated one-armed bandit problems. As a consequence, we obtain a computing time which is essentially linear in the number of arms. In addition, the decomposition provides a theoretical lower bound on the regret. For the two-armed bandit case, dynamic programming provides the exact solution, which is almost matched by the DeCo heuristic. Moreover, in numerical simulations with up to 100 rounds and 20 arms, DeCo outperforms classic algorithms (Thompson sampling and Kullback-Leibler upper-confidence bound) and almost matches the theoretical lower bound on the regret for 20 arms.



قيم البحث

اقرأ أيضاً

Restless Multi-Armed Bandits (RMABs) have been popularly used to model limited resource allocation problems. Recently, these have been employed for health monitoring and intervention planning problems. However, the existing approaches fail to account for the arrival of new patients and the departure of enrolled patients from a treatment program. To address this challenge, we formulate a streaming bandit (S-RMAB) framework, a generalization of RMABs where heterogeneous arms arrive and leave under possibly random streams. We propose a new and scalable approach to computing index-based solutions. We start by proving that index values decrease for short residual lifetimes, a phenomenon that we call index decay. We then provide algorithms designed to capture index decay without having to solve the costly finite horizon problem, thereby lowering the computational complexity compared to existing methods.We evaluate our approach via simulations run on real-world data obtained from a tuberculosis intervention planning task as well as multiple other synthetic domains. Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance. These findings are robust across multiple domains.
In this paper we study a class of risk-sensitive Markovian control problems in discrete time subject to model uncertainty. We consider a risk-sensitive discounted cost criterion with finite time horizon. The used methodology is the one of adaptive robust control combined with machine learning.
Optimal control problems with a very large time horizon can be tackled with the Receding Horizon Control (RHC) method, which consists in solving a sequence of optimal control problems with small prediction horizon. The main result of this article is the proof of the exponential convergence (with respect to the prediction horizon) of the control generated by the RHC method towards the exact solution of the problem. The result is established for a class of infinite-dimensional linear-quadratic optimal control problems with time-independent dynamics and integral cost. Such problems satisfy the turnpike property: the optimal trajectory remains most of the time very close to the solution to the associated static optimization problem. Specific terminal cost functions, derived from the Lagrange multiplier associated with the static optimization problem, are employed in the implementation of the RHC method.
This paper is devoted to analysing the explicit slow decay rate and turnpike in the infinite-horizon linear quadratic optimal control problems for hyperbolic systems. Assume that some weak observability or controllability are satisfied, by which, the lower and upper bounds of the corresponding algebraic Riccati operator are estimated, respectively. Then based on these two bounds, the explicit slow decay rate of the closed-loop system with Riccati-based optimal feedback control is obtained. The averaged turnpike property for this problem is also further discussed. We then apply these results to the LQ optimal control problems constraint to networks of one-dimensional wave equations and also some multi-dimensional ones with local controls which lack of GCC(Geometric Control Condition).
We propose a semi-proximal augmented Lagrangian based decomposition method for convex composite quadratic conic programming problems with primal block angular structures. Using our algorithmic framework, we are able to naturally derive several well k nown augmented Lagrangian based decomposition methods for stochastic programming such as the diagonal quadratic approximation method of Mulvey and Ruszczy{n}ski. Moreover, we are able to derive novel enhancements and generalizations of these well known methods. We also propose a semi-proximal symmetric Gauss-Seidel based alternating direction method of multipliers for solving the corresponding dual problem. Numerical results show that our algorithms can perform well even for very large instances of primal block angular convex QP problems. For example, one instance with more than $300,000$ linear constraints and $12,500,000$ nonnegative variables is solved in less than a minute whereas Gurobi took more than 3 hours, and another instance {tt qp-gridgen1} with more than $331,000$ linear constraints and $986,000$ nonnegative variables is solved in about 5 minutes whereas Gurobi took more than 35 minutes.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا