No Arabic abstract
Under voltage load shedding has been considered as a standard and effective measure to recover the voltage stability of the electric power grid under emergency and severe conditions. However, this scheme usually trips a massive amount of load which can be unnecessary and harmful to customers. Recently, deep reinforcement learning (RL) has been regarded and adopted as a promising approach that can significantly reduce the amount of load shedding. However, like most existing machine learning (ML)-based control techniques, RL control usually cannot guarantee the safety of the systems under control. In this paper, we introduce a novel safe RL method for emergency load shedding of power systems, that can enhance the safe voltage recovery of the electric power grid after experiencing faults. Unlike the standard RL method, the safe RL method has a reward function consisting of a Barrier function that goes to minus infinity when the system state goes to the safety bounds. Consequently, the optimal control policy can render the power system to avoid the safety bounds. This method is general and can be applied to other safety-critical control problems. Numerical simulations on the 39-bus IEEE benchmark is performed to demonstrate the effectiveness of the proposed safe RL emergency control, as well as its adaptive capability to faults not seen in the training.
Load shedding has been one of the most widely used and effective emergency control approaches against voltage instability. With increased uncertainties and rapidly changing operational conditions in power systems, existing methods have outstanding issues in terms of either speed, adaptiveness, or scalability. Deep reinforcement learning (DRL) was regarded and adopted as a promising approach for fast and adaptive grid stability control in recent years. However, existing DRL algorithms show two outstanding issues when being applied to power system control problems: 1) computational inefficiency that requires extensive training and tuning time; and 2) poor scalability making it difficult to scale to high dimensional control problems. To overcome these issues, an accelerated DRL algorithm named PARS was developed and tailored for power system voltage stability control via load shedding. PARS features high scalability and is easy to tune with only five main hyperparameters. The method was tested on both the IEEE 39-bus and IEEE 300-bus systems, and the latter is by far the largest scale for such a study. Test results show that, compared to other methods including model-predictive control (MPC) and proximal policy optimization(PPO) methods, PARS shows better computational efficiency (faster convergence), more robustness in learning, excellent scalability and generalization capability.
Emergency control, typically such as under-voltage load shedding (UVLS), is broadly used to grapple with low voltage and voltage instability issues in practical power systems under contingencies. However, existing emergency control schemes are rule-based and cannot be adaptively applied to uncertain and floating operating conditions. This paper proposes an adaptive UVLS algorithm for emergency control via deep reinforcement learning (DRL) and expert systems. We first construct dynamic components for picturing the power system operation as the environment. The transient voltage recovery criteria, which poses time-varying requirements to UVLS, is integrated into the states and reward function to advise the learning of deep neural networks. The proposed approach has no tuning issue of coefficients in reward functions, and this issue was regarded as a deficiency in the existing DRL-based algorithms. Extensive case studies illustrate that the proposed method outperforms the traditional UVLS relay in both the timeliness and efficacy for emergency control.
This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints. Despite its success in many domains, reinforcement learning is challenging to apply to problems with hard constraints, especially if both the state variables and actions are constrained. Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to a learned policy. Yet, this approach requires solving an optimization problem at every policy execution step, which can lead to significant computational costs. To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during exploration and on learned control policies by incorporating the safety constraints into the policy network architecture. Leveraging the geometric property that all points within a convex set can be represented as the convex combination of its vertices, the proposed algorithm first learns the convex combination weights and then uses these weights along with the pre-calculated vertices to output an action. The output action is guaranteed to be safe by construction. Numerical examples illustrate that the proposed VN algorithm outperforms vanilla reinforcement learning in a variety of benchmark control tasks.
In this paper, we propose a new control barrier function based quadratic program for general nonlinear control-affine systems, which, without any assumptions other than those taken in the original program, simultaneously guarantees forward invariance of the safety set, complete elimination of undesired equilibrium points inside it, and local asymptotic stability of the origin. To better appreciate this result, we first characterize the equilibrium points of the closed-loop system with the original quadratic program formulation. We then provide analytical results on how a certain parameter in the original quadratic program should be chosen to remove the undesired equilibrium points or to confine them in a small neighborhood of the origin. The new formulation then follows from these analytical results. Numerical examples are given alongside the theoretical discussions.
Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.