ترغب بنشر مسار تعليمي؟ اضغط هنا

The deployment of autonomous systems in uncertain and dynamic environments has raised fundamental questions. Addressing these is pivotal to build fully autonomous systems and requires a systematic integration of planning and control. We first propose reactive risk signal interval temporal logic (ReRiSITL) as an extension of signal temporal logic (STL) to formulate complex spatiotemporal specifications. Unlike STL, ReRiSITL allows to consider uncontrollable propositions that may model humans as well as random environmental events such as sensor failures. Additionally, ReRiSITL allows to incorporate risk measures, such as (but not limited to) the Conditional Value-at-Risk, to measure the risk of violating certain spatial specifications. Second, we propose an algorithm to check if an ReRiSITL specification is satisfiable. For this purpose, we abstract the ReRiSITL specification into a timed signal transducer and devise a game-based approach. Third, we propose a reactive planning and control framework for dynamical control systems under ReRiSITL specifications.
This paper focuses on the controller synthesis for unknown, nonlinear systems while ensuring safety constraints. Our approach consists of two steps, a learning step that uses Gaussian processes and a controller synthesis step that is based on control barrier functions. In the learning step, we use a data-driven approach utilizing Gaussian processes to learn the unknown control affine nonlinear dynamics together with a statistical bound on the accuracy of the learned model. In the second controller synthesis steps, we develop a systematic approach to compute control barrier functions that explicitly take into consideration the uncertainty of the learned model. The control barrier function not only results in a safe controller by construction but also provides a rigorous lower bound on the probability of satisfaction of the safety specification. Finally, we illustrate the effectiveness of the proposed results by synthesizing a safety controller for a jet engine example.
Deterministic Policy Gradient (DPG) removes a level of randomness from standard randomized-action Policy Gradient (PG), and demonstrates substantial empirical success for tackling complex dynamic problems involving Markov decision processes. At the s ame time, though, DPG loses its ability to learn in a model-free (i.e., actor-only) fashion, frequently necessitating the use of critics in order to obtain consistent estimates of the associated policy-reward gradient. In this work, we introduce Zeroth-order Deterministic Policy Gradient (ZDPG), which approximates policy-reward gradients via two-point stochastic evaluations of the $Q$-function, constructed by properly designed low-dimensional action-space perturbations. Exploiting the idea of random horizon rollouts for obtaining unbiased estimates of the $Q$-function, ZDPG lifts the dependence on critics and restores true model-free policy learning, while enjoying built-in and provable algorithmic stability. Additionally, we present new finite sample complexity bounds for ZDPG, which improve upon existing results by up to two orders of magnitude. Our findings are supported by several numerical experiments, which showcase the effectiveness of ZDPG in a practical setting, and its advantages over both PG and Baseline PG.
Temporal logics provide a formalism for expressing complex system specifications. A large body of literature has addressed the verification and the control synthesis problem for deterministic systems under such specifications. For stochastic systems or systems operating in unknown environments, however, only the probability of satisfying a specification has been considered so far, neglecting the risk of not satisfying the specification. Towards addressing this shortcoming, we consider, for the first time, risk metrics, such as (but not limited to) the Conditional Value-at-Risk, and propose risk signal temporal logic. Specifically, we compose risk metrics with stochastic predicates to consider the risk of violating certain spatial specifications. As a particular instance of such stochasticity, we consider control systems in unknown environments and present a determinization of the risk signal temporal logic specification to transform the stochastic control problem into a deterministic one. For unicycle-like dynamics, we then extend our previous work on deterministic time-varying control barrier functions.
Despite the simplicity and intuitive interpretation of Minimum Mean Squared Error (MMSE) estimators, their effectiveness in certain scenarios is questionable. Indeed, minimizing squared errors on average does not provide any form of stability, as the volatility of the estimation error is left unconstrained. When this volatility is statistically significant, the difference between the average and realized performance of the MMSE estimator can be drastically different. To address this issue, we introduce a new risk-aware MMSE formulation which trades between mean performance and risk by explicitly constraining the expected predictive variance of the involved squared error. We show that, under mild moment boundedness conditions, the corresponding risk-aware optimal solution can be evaluated explicitly, and has the form of an appropriately biased nonlinear MMSE estimator. We further illustrate the effectiveness of our approach via several numerical examples, which also showcase the advantages of risk-aware MMSE estimation against risk-neutral MMSE estimation, especially in models involving skewed, heavy-tailed distributions.
Learning optimal resource allocation policies in wireless systems can be effectively achieved by formulating finite dimensional constrained programs which depend on system configuration, as well as the adopted learning parameterization. The interest here is in cases where system models are unavailable, prompting methods that probe the wireless system with candidate policies, and then use observed performance to determine better policies. This generic procedure is difficult because of the need to cull accurate gradient estimates out of these limited system queries. This paper constructs and exploits smoothed surrogates of constrained ergodic resource allocation problems, the gradients of the former being representable exactly as averages of finite differences that can be obtained through limited system probing. Leveraging this unique property, we develop a new model-free primal-dual algorithm for learning optimal ergodic resource allocations, while we rigorously analyze the relationships between original policy search problems and their surrogates, in both primal and dual domains. First, we show that both primal and dual domain surrogates are uniformly consistent approximations of their corresponding original finite dimensional counterparts. Upon further assuming the use of near-universal policy parameterizations, we also develop explicit bounds on the gap between optimal values of initial, infinite dimensional resource allocation problems, and dual values of their parameterized smoothed surrogates. In fact, we show that this duality gap decreases at a linear rate relative to smoothing and universality parameters. Thus, it can be made arbitrarily small at will, also justifying our proposed primal-dual algorithmic recipe. Numerical simulations confirm the effectiveness of our approach.
We consider a distributed optimization problem over a network of agents aiming to minimize a global objective function that is the sum of local convex and composite cost functions. To this end, we propose a distributed Chebyshev-accelerated primal-du al algorithm to achieve faster ergodic convergence rates. In standard distributed primal-dual algorithms, the speed of convergence towards a global optimum (i.e., a saddle point in the corresponding Lagrangian function) is directly influenced by the eigenvalues of the Laplacian matrix representing the communication graph. In this paper, we use Chebyshev matrix polynomials to generate gossip matrices whose spectral properties result in faster convergence speeds, while allowing for a fully distributed implementation. As a result, the proposed algorithm requires fewer gradient updates at the cost of additional rounds of communications between agents. We illustrate the performance of the proposed algorithm in a distributed signal recovery problem. Our simulations show how the use of Chebyshev matrix polynomials can be used to improve the convergence speed of a primal-dual algorithm over communication networks, especially in networks with poor spectral properties, by trading local computation by communication rounds.
This paper considers the analysis and design of resilient/robust decentralized control systems. Specifically, we aim to assess how the pairing of sensors and actuators lead to architectures that are resilient to attacks/hacks for industrial control s ystems and other complex cyber-physical systems. We consider inherent structural properties such as internal fixed modes of a dynamical system depending on actuation, sensing, and interconnection/communication structure for linear discrete time-invariant dynamical systems. We introduce the notion of resilient fixed-modes free system that ensures the non-existence of fixed modes when the actuation-sensing-communication structure is compromised due to attacks by a malicious agent on actuators, sensors, or communication components and natural failures. Also, we provide a graph-theoretical characterization for the resilient structurally fixed modes that enables to capture the non-existence of resilient fixed modes for almost all possible systems realizations. Additionally, we address the minimum actuation-sensing-communication co-design ensuring the non-existence of resiliently structurally fixed modes, which we show to be NP-hard. Notwithstanding, we identify conditions that are often satisfied in engineering settings and under which the co-design problem is solvable in polynomial-time complexity. Furthermore, we leverage the structural insights and properties to provide a convex optimization method to design the gain for a parametrized system and satisfying the sparsity of a given information pattern. Thus, exploring the interplay between structural and non-structural systems to ensure their resilience. Finally, the efficacy of the proposed approach is demonstrated on a power grid example.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا