Do you want to publish a course? Click here

Smoothing-Averse Control: Covertness and Privacy from Smoothers

61   0   0.0 ( 0 )
 Added by Timothy Molloy
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

In this paper we investigate the problem of controlling a partially observed stochastic dynamical system such that its state is difficult to infer using a (fixed-interval) Bayesian smoother. This problem arises naturally in applications in which it is desirable to keep the entire state trajectory of a system concealed. We pose our smoothing-averse control problem as the problem of maximising the (joint) entropy of smoother state estimates (i.e., the joint conditional entropy of the state trajectory given the history of measurements and controls). We show that the entropy of Bayesian smoother estimates for general nonlinear state-space models can be expressed as the sum of entropies of marginal state estimates given by Bayesian filters. This novel additive form allows us to reformulate the smoothing-averse control problem as a fully observed stochastic optimal control problem in terms of the usual concept of the information (or belief) state, and solve the resulting problem via dynamic programming. We illustrate the applicability of smoothing-averse control to privacy in cloud-based control and covert robotic navigation.



rate research

Read More

In Path Integral control problems a representation of an optimally controlled dynamical system can be formally computed and serve as a guidepost to learn a parametrized policy. The Path Integral Cross-Entropy (PICE) method tries to exploit this, but is hampered by poor sample efficiency. We propose a model-free algorithm called ASPIC (Adaptive Smoothing of Path Integral Control) that applies an inf-convolution to the cost function to speedup convergence of policy optimization. We identify PICE as the infinite smoothing limit of such technique and show that the sample efficiency problems that PICE suffers disappear for finite levels of smoothing. For zero smoothing this method becomes a greedy optimization of the cost, which is the standard approach in current reinforcement learning. We show analytically and empirically that intermediate levels of smoothing are optimal, which renders the new method superior to both PICE and direct cost-optimization.
We consider the stochastic shortest path planning problem in MDPs, i.e., the problem of designing policies that ensure reaching a goal state from a given initial state with minimum accrued cost. In order to account for rare but important realizations of the system, we consider a nested dynamic coherent risk total cost functional rather than the conventional risk-neutral total expected cost. Under some assumptions, we show that optimal, stationary, Markovian policies exist and can be found via a special Bellmans equation. We propose a computational technique based on difference convex programs (DCPs) to find the associated value functions and therefore the risk-averse policies. A rover navigation MDP is used to illustrate the proposed methodology with conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures.
We study a risk-averse optimal control problem with a finite-horizon Borel model, where the cost is assessed via exponential utility. The setting permits non-linear dynamics, non-quadratic costs, and continuous spaces but is less general than the problem of optimizing an expected utility. Our contribution is to show the existence of an optimal risk-averse controller through the use of measure-theoretic first principles.
We propose a learning-based, distributionally robust model predictive control approach towards the design of adaptive cruise control (ACC) systems. We model the preceding vehicle as an autonomous stochastic system, using a hybrid model with continuous dynamics and discrete, Markovian inputs. We estimate the (unknown) transition probabilities of this model empirically using observed mode transitions and simultaneously determine sets of probability vectors (ambiguity sets) around these estimates, that contain the true transition probabilities with high confidence. We then solve a risk-averse optimal control problem that assumes the worst-case distributions in these sets. We furthermore derive a robust terminal constraint set and use it to establish recursive feasibility of the resulting MPC scheme. We validate the theoretical results and demonstrate desirable properties of the scheme through closed-loop simulations.
Motivated by the lack of systematic tools to obtain safe control laws for hybrid systems, we propose an optimization-based framework for learning certifiably safe control laws from data. In particular, we assume a setting in which the system dynamics are known and in which data exhibiting safe system behavior is available. We propose hybrid control barrier functions for hybrid systems as a means to synthesize safe control inputs. Based on this notion, we present an optimization-based framework to learn such hybrid control barrier functions from data. Importantly, we identify sufficient conditions on the data such that feasibility of the optimization problem ensures correctness of the learned hybrid control barrier functions, and hence the safety of the system. We illustrate our findings in two simulations studies, including a compass gait walker.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا