No Arabic abstract
The DC optimal power flow (DCOPF) problem is a fundamental problem in power systems operations and planning. With high penetration of uncertain renewable resources in power systems, DCOPF needs to be solved repeatedly for a large amount of scenarios, which can be computationally challenging. As an alternative to iterative solvers, neural networks are often trained and used to solve DCOPF. These approaches can offer orders of magnitude reduction in computational time, but they cannot guarantee generalization, and small training error does not imply small testing errors. In this work, we propose a novel algorithm for solving DCOPF that guarantees the generalization performance. First, by utilizing the convexity of DCOPF problem, we train an input convex neural network. Second, we construct the training loss based on KKT optimality conditions. By combining these two techniques, the trained model has provable generalization properties, where small training error implies small testing errors. In experiments, our algorithm improves the optimality ratio of the solutions by a factor of five in comparison to end-to-end models.
Neural network controllers have become popular in control tasks thanks to their flexibility and expressivity. Stability is a crucial property for safety-critical dynamical systems, while stabilization of partially observed systems, in many cases, requires controllers to retain and process long-term memories of the past. We consider the important class of recurrent neural networks (RNN) as dynamic controllers for nonlinear uncertain partially-observed systems, and derive convex stability conditions based on integral quadratic constraints, S-lemma and sequential convexification. To ensure stability during the learning and control process, we propose a projected policy gradient method that iteratively enforces the stability conditions in the reparametrized space taking advantage of mild additional information on system dynamics. Numerical experiments show that our method learns stabilizing controllers while using fewer samples and achieving higher final performance compared with policy gradient.
Control schemes for autonomous systems are often designed in a way that anticipates the worst case in any situation. At runtime, however, there could exist opportunities to leverage the characteristics of specific environment and operation context for more efficient control. In this work, we develop an online intermittent-control framework that combines formal verification with model-based optimization and deep reinforcement learning to opportunistically skip certain control computation and actuation to save actuation energy and computational resources without compromising system safety. Experiments on an adaptive cruise control system demonstrate that our approach can achieve significant energy and computation savings.
Real-time vehicle dispatching operations in traditional car-sharing systems is an already computationally challenging scheduling problem. Electrification only exacerbates the computational difficulties as charge level constraints come into play. To overcome this complexity, we employ an online minimum drift plus penalty (MDPP) approach for SAEV systems that (i) does not require a priori knowledge of customer arrival rates to the different parts of the system (i.e. it is practical from a real-world deployment perspective), (ii) ensures the stability of customer waiting times, (iii) ensures that the deviation of dispatch costs from a desirable dispatch cost can be controlled, and (iv) has a computational time-complexity that allows for real-time implementation. Using an agent-based simulator developed for SAEV systems, we test the MDPP approach under two scenarios with real-world calibrated demand and charger distributions: 1) a low-demand scenario with long trips, and 2) a high-demand scenario with short trips. The comparisons with other algorithms under both scenarios show that the proposed online MDPP outperforms all other algorithms in terms of both reduced customer waiting times and vehicle dispatching costs.
A method is presented to learn neural network (NN) controllers with stability and safety guarantees through imitation learning (IL). Convex stability and safety conditions are derived for linear time-invariant plant dynamics with NN controllers by merging Lyapunov theory with local quadratic constraints to bound the nonlinear activation functions in the NN. These conditions are incorporated in the IL process, which minimizes the IL loss, and maximizes the volume of the region of attraction associated with the NN controller simultaneously. An alternating direction method of multipliers based algorithm is proposed to solve the IL problem. The method is illustrated on an inverted pendulum system, aircraft longitudinal dynamics, and vehicle lateral dynamics.
The vulnerability of artificial intelligence (AI) and machine learning (ML) against adversarial disturbances and attacks significantly restricts their applicability in safety-critical systems including cyber-physical systems (CPS) equipped with neural network components at various stages of sensing and control. This paper addresses the reachable set estimation and safety verification problems for dynamical systems embedded with neural network components serving as feedback controllers. The closed-loop system can be abstracted in the form of a continuous-time sampled-data system under the control of a neural network controller. First, a novel reachable set computation method in adaptation to simulations generated out of neural networks is developed. The reachability analysis of a class of feedforward neural networks called multilayer perceptrons (MLP) with general activation functions is performed in the framework of interval arithmetic. Then, in combination with reachability methods developed for various dynamical system classes modeled by ordinary differential equations, a recursive algorithm is developed for over-approximating the reachable set of the closed-loop system. The safety verification for neural network control systems can be performed by examining the emptiness of the intersection between the over-approximation of reachable sets and unsafe sets. The effectiveness of the proposed approach has been validated with evaluations on a robotic arm model and an adaptive cruise control system.