No Arabic abstract
In stochastic dynamic environments, team stochastic games have emerged as a versatile paradigm for studying sequential decision-making problems of fully cooperative multi-agent systems. However, the optimality of the derived policies is usually sensitive to the model parameters, which are typically unknown and required to be estimated from noisy data in practice. To mitigate the sensitivity of the optimal policy to these uncertain parameters, in this paper, we propose a model of robust team stochastic games, where players utilize a robust optimization approach to make decisions. This model extends team stochastic games to the scenario of incomplete information and meanwhile provides an alternative solution concept of robust team optimality. To seek such a solution, we develop a learning algorithm in the form of a Gauss-Seidel modified policy iteration and prove its convergence. This algorithm, compared with robust dynamic programming, not only possesses a faster convergence rate, but also allows for using approximation calculations to alleviate the curse of dimensionality. Moreover, some numerical simulations are presented to demonstrate the effectiveness of the algorithm by generalizing the game model of social dilemmas to sequential robust scenarios.
Optimal control of a stochastic dynamical system usually requires a good dynamical model with probability distributions, which is difficult to obtain due to limited measurements and/or complicated dynamics. To solve it, this work proposes a data-driven distributionally robust control framework with the Wasserstein metric via a constrained two-player zero-sum Markov game, where the adversarial player selects the probability distribution from a Wasserstein ball centered at an empirical distribution. Then, the game is approached by its penalized version, an optimal stabilizing solution of which is derived explicitly in a linear structure under the Riccati-type iterations. Moreover, we design a model-free Q-learning algorithm with global convergence to learn the optimal controller. Finally, we verify the effectiveness of the proposed learning algorithm and demonstrate its robustness to the probability distribution errors via numerical examples.
In this paper, we investigate a sparse optimal control of continuous-time stochastic systems. We adopt the dynamic programming approach and analyze the optimal control via the value function. Due to the non-smoothness of the $L^0$ cost functional, in general, the value function is not differentiable in the domain. Then, we characterize the value function as a viscosity solution to the associated Hamilton-Jacobi-Bellman (HJB) equation. Based on the result, we derive a necessary and sufficient condition for the $L^0$ optimality, which immediately gives the optimal feedback map. Especially for control-affine systems, we consider the relationship with $L^1$ optimal control problem and show an equivalence theorem.
We study the problem of optimal inside control of an SPDE (a stochastic evolution equation) driven by a Brownian motion and a Poisson random measure. Our optimal control problem is new in two ways: (i) The controller has access to inside information, i.e. access to information about a future state of the system, (ii) The integro-differential operator of the SPDE might depend on the control. In the first part of the paper, we formulate a sufficient and a necessary maximum principle for this type of control problem, in two cases: (1) When the control is allowed to depend both on time t and on the space variable x. (2) When the control is not allowed to depend on x. In the second part of the paper, we apply the results above to the problem of optimal control of an SDE system when the inside controller has only noisy observations of the state of the system. Using results from nonlinear filtering, we transform this noisy observation SDE inside control problem into a full observation SPDE insider control problem. The results are illustrated by explicit examples.
We approach the development of models and control strategies of susceptible-infected-susceptible (SIS) epidemic processes from the perspective of marked temporal point processes and stochastic optimal control of stochastic differential equations (SDEs) with jumps. In contrast to previous work, this novel perspective is particularly well-suited to make use of fine-grained data about disease outbreaks and lets us overcome the shortcomings of current control strategies. Our control strategy resorts to treatment intensities to determine who to treat and when to do so to minimize the amount of infected individuals over time. Preliminary experiments with synthetic data show that our control strategy consistently outperforms several alternatives. Looking into the future, we believe our methodology provides a promising step towards the development of practical data-driven control strategies of epidemic processes.
This paper describes an optimization framework to control a distributed parameter system (DPS) using a team of mobile actuators. The framework simultaneously seeks optimal control of the DPS and optimal guidance of the mobile actuators such that a cost function associated with both the DPS and the mobile actuators is minimized subject to the dynamics of each. The cost incurred from controlling the DPS is linear-quadratic, which is transformed into an equivalent form as a quadratic term associated with an operator-valued Riccati equation. This equivalent form reduces the problem to seeking for guidance only because the optimal control can be recovered once the optimal guidance is obtained. We establish conditions for the existence of a solution to the proposed problem. Since computing an optimal solution requires approximation, we also establish the conditions for convergence to the exact optimal solution of the approximate optimal solution. That is, when evaluating these two solutions by the original cost function, the difference becomes arbitrarily small as the approximation gets finer. Two numerical examples demonstrate the performance of the optimal control and guidance obtained from the proposed approach.