Optimal Deceptive and Reference Policies for Supervisory Control

46 0 0.0 ( 0 )

Download Cite

Added by Mustafa O. Karabag

Publication date 2019

fields

and research's language is English

Authors Mustafa O. Karabag - Melkior Ornik - Ufuk Topcu

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The use of deceptive strategies is important for an agent that attempts not to reveal his intentions in an adversarial environment. We consider a setting in which a supervisor provides a reference policy and expects an agent to follow the reference policy and perform a task. The agent may instead follow a different, deceptive policy to achieve a different task. We model the environment and the behavior of the agent with a Markov decision process, represent the tasks of the agent and the supervisor with linear temporal logic formulae, and study the synthesis of optimal deceptive policies for such agents. We also study the synthesis of optimal reference policies that prevents deceptive strategies of the agent and achieves the supervisors task with high probability. We show that the synthesis of deceptive policies has a convex optimization problem formulation, while the synthesis of reference policies requires solving a nonconvex optimization problem.

rate research

Learning Convex Optimization Control Policies

161 - Akshay Agrawal , Shane Barratt , Stephen Boyd 2019

Many control policies used in various applications determine the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex control-Lyapunov or approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a crude grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex optimization problem with respect to its parameters. We illustrate our method on several examples.

Optimization and Control Machine Learning

Optimal Lockdown for Pandemic Control

150 - Qianqian Ma , Yang-Yu Liu , Alex Olshevsky 2020

As a common strategy of contagious disease containment, lockdown will inevitably weaken the economy. The ongoing COVID-19 pandemic underscores the trade-off arising from public health and economic cost. An optimal lockdown policy to resolve this trade-off is highly desired. Here we propose a mathematical framework of pandemic control through an optimal non-uniform lockdown, where our goal is to reduce the economic activity as little as possible while decreasing the number of infected individuals at a prescribed rate. This framework allows us to efficiently compute the optimal lockdown policy for general epidemic spread models, including both the classical SIS/SIR/SEIR models and a new model of COVID-19 transmissions. We demonstrate the power of this framework by analyzing publicly available data of inter-county travel frequencies to analyze a model of COVID-19 spread in the 62 counties of New York State. We find that an optimal lockdown based on epidemic status in April 2020 would have reduced economic activity more stringently outside of New York City compared to within it, even though the epidemic was much more prevalent in New York City at that point. Such a counterintuitive result highlights the intricacies of pandemic control and sheds light on future lockdown policy design.

Optimization and Control Physics and Society

Designing Near-Optimal Policies for Energy Management in a Stochastic Environment

401 - Chaitanya Poolla , Abraham K. Ishihara , Rodolfo Milito 2018

With the rapid growth in renewable energy and battery storage technologies, there exists significant opportunity to improve energy efficiency and reduce costs through optimization. However, optimization algorithms must take into account the underlying dynamics and uncertainties of the various interconnected subsystems in order to fully realize this potential. To this end, we formulate and solve an energy management optimization problem as a Markov Decision Process (MDP) consisting of battery storage dynamics, a stochastic demand model, a stochastic solar generation model, and an electricity pricing scheme. The stochastic model for predicting solar generation is constructed based on weather forecast data from the National Oceanic and Atmospheric Administration. A near-optimal policy design is proposed via stochastic dynamic programming. Simulation results are presented in the context of storage and solar-integrated residential and commercial building environments. Results indicate that the near-optimal policy significantly reduces the operating costs compared to several heuristic alternatives. The proposed framework facilitates the design and evaluation of energy management policies with configurable demand-supply-storage parameters in the presence of weather-induced uncertainties.

Optimization and Control Systems and Control

Spatially Controlled Relay Beamforming: $2$-Stage Optimal Policies

122 - Dionysios S. Kalogerias , Athina P. Petropulu 2017

The problem of enhancing Quality-of-Service (QoS) in power constrained, mobile relay beamforming networks, by optimally and dynamically controlling the motion of the relaying nodes, is considered, in a dynamic channel environment. We assume a time slotted system, where the relays update their positions before the beginning of each time slot. Modeling the wireless channel as a Gaussian spatiotemporal stochastic field, we propose a novel $2$-stage stochastic programming problem formulation for optimally specifying the positions of the relays at each time slot, such that the expected QoS of the network is maximized, based on causal Channel State Information (CSI) and under a total relay transmit power budget. This results in a schema where, at each time slot, the relays, apart from optimally beamforming to the destination, also optimally, predictively decide their positions at the next time slot, based on causally accumulated experience. Exploiting either the Method of Statistical Differentials, or the multidimensional Gauss-Hermite Quadrature Rule, the stochastic program considered is shown to be approximately equivalent to a set of simple subproblems, which are solved in a distributed fashion, one at each relay. Optimality and performance of the proposed spatially controlled system are also effectively assessed, under a rigorous technical framework; strict optimality is rigorously demonstrated via the development of a version of the Fundamental Lemma of Stochastic Control, and, performance-wise, it is shown that, quite interestingly, the optimal average network QoS exhibits an increasing trend across time slots, despite our myopic problem formulation. Numerical simulations are presented, experimentally corroborating the success of the proposed approach and the validity of our theoretical predictions.

Optimization and Control Information Theory Information Theory

Modeling and Control of COVID-19 Epidemic through Testing Policies

62 - Muhammad Umar B. Niazi , Alain Kibangou , Carlos Canudas-de-Wit 2020

Testing for the infected cases is one of the most important mechanisms to control an epidemic. It enables to isolate the detected infected individuals, thereby limiting the disease transmission to the susceptible population. However, despite the significance of testing policies, the recent literature on the subject lacks a control-theoretic perspective. In this work, an epidemic model that incorporates the testing rate as a control input is presented. The proposed model differentiates the undetected infected from the detected infected cases, who are assumed to be removed from the disease spreading process in the population. First, the model is estimated and validated for COVID-19 data in France. Then, two testing policies are proposed, the so-called best-effort strategy for testing (BEST) and constant optimal strategy for testing (COST). The BEST policy is a suppression strategy that provides a lower bound on the testing rate such that the epidemic switches from a spreading to a non-spreading state. The COST policy is a mitigation strategy that provides an optimal value of testing rate that minimizes the peak value of the infected population when the total stockpile of tests is limited. Both testing policies are evaluated by predicting the number of active intensive care unit (ICU) cases and the cumulative number of deaths due to COVID-19.

Optimization and Control Systems and Control Systems and Control