No Arabic abstract
Traditional methods for solvability region analysis can only have inner approximations with inconclusive conservatism. Machine learning methods have been proposed to approach the real region. In this letter, we propose a deep active learning framework for power system solvability prediction. Compared with the passive learning methods where the training is performed after all instances are labeled, the active learning selects most informative instances to be label and therefore significantly reduce the size of labeled dataset for training. In the active learning framework, the acquisition functions, which correspond to different sampling strategies, are defined in terms of the on-the-fly posterior probability from the classifier. The IEEE 39-bus system is employed to validate the proposed framework, where a two-dimensional case is illustrated to visualize the effectiveness of the sampling method followed by the full-dimensional numerical experiments.
Active network management (ANM) of electricity distribution networks include many complex stochastic sequential optimization problems. These problems need to be solved for integrating renewable energies and distributed storage into future electrical grids. In this work, we introduce Gym-ANM, a framework for designing reinforcement learning (RL) environments that model ANM tasks in electricity distribution networks. These environments provide new playgrounds for RL research in the management of electricity networks that do not require an extensive knowledge of the underlying dynamics of such systems. Along with this work, we are releasing an implementation of an introductory toy-environment, ANM6-Easy, designed to emphasize common challenges in ANM. We also show that state-of-the-art RL algorithms can already achieve good performance on ANM6-Easy when compared against a model predictive control (MPC) approach. Finally, we provide guidelines to create new Gym-ANM environments differing in terms of (a) the distribution network topology and parameters, (b) the observation space, (c) the modelling of the stochastic processes present in the system, and (d) a set of hyperparameters influencing the reward signal. Gym-ANM can be downloaded at https://github.com/robinhenry/gym-anm.
Attempts from different disciplines to provide a fundamental understanding of deep learning have advanced rapidly in recent years, yet a unified framework remains relatively limited. In this article, we provide one possible way to align existing branches of deep learning theory through the lens of dynamical system and optimal control. By viewing deep neural networks as discrete-time nonlinear dynamical systems, we can analyze how information propagates through layers using mean field theory. When optimization algorithms are further recast as controllers, the ultimate goal of training processes can be formulated as an optimal control problem. In addition, we can reveal convergence and generalization properties by studying the stochastic dynamics of optimization algorithms. This viewpoint features a wide range of theoretical study from information bottleneck to statistical physics. It also provides a principled way for hyper-parameter tuning when optimal control theory is introduced. Our framework fits nicely with supervised learning and can be extended to other learning problems, such as Bayesian learning, adversarial training, and specific forms of meta learning, without efforts. The review aims to shed lights on the importance of dynamics and optimal control when developing deep learning theory.
The rapid growth of distributed energy resources potentially increases power grid instability. One promising strategy is to employ data in power grids to efficiently respond to abnormal events (e.g., faults) by detection and location. Unfortunately, most existing works lack physical interpretation and are vulnerable to the practical challenges: sparse observation, insufficient labeled datasets, and stochastic environment. We propose a physics-informed graph learning framework of two stages to handle these challenges when locating faults. Stage- I focuses on informing a graph neural network (GNN) with the geometrical structure of power grids; stage-II employs the physical similarity of labeled and unlabeled data samples to improve the location accuracy. We provide a random walk-based the underpinning of designing our GNNs to address the challenge of sparse observation and augment the correct prediction probability. We compare our approach with three baselines in the IEEE 123-node benchmark system, showing that the proposed method outperforms the others by significant margins, especially when label rates are low. Also, we validate the robustness of our algorithms to out-of-distribution-data (ODD) due to topology changes and load variations. Additionally, we adapt our graph learning framework to the IEEE 37-node test feeder and show high location performance with the proposed training strategy.
We consider the problem of reward learning for temporally extended tasks. For reward learning, inverse reinforcement learning (IRL) is a widely used paradigm. Given a Markov decision process (MDP) and a set of demonstrations for a task, IRL learns a reward function that assigns a real-valued reward to each state of the MDP. However, for temporally extended tasks, the underlying reward function may not be expressible as a function of individual states of the MDP. Instead, the history of visited states may need to be considered to determine the reward at the current state. To address this issue, we propose an iterative algorithm to learn a reward function for temporally extended tasks. At each iteration, the algorithm alternates between two modules, a task inference module that infers the underlying task structure and a reward learning module that uses the inferred task structure to learn a reward function. The task inference module produces a series of queries, where each query is a sequence of subgoals. The demonstrator provides a binary response to each query by attempting to execute it in the environment and observing the environments feedback. After the queries are answered, the task inference module returns an automaton encoding its current hypothesis of the task structure. The reward learning module augments the state space of the MDP with the states of the automaton. The module then proceeds to learn a reward function over the augmented state space using a novel deep maximum entropy IRL algorithm. This iterative process continues until it learns a reward function with satisfactory performance. The experiments show that the proposed algorithm significantly outperforms several IRL baselines on temporally extended tasks.
As power systems are undergoing a significant transformation with more uncertainties, less inertia and closer to operation limits, there is increasing risk of large outages. Thus, there is an imperative need to enhance grid emergency control to maintain system reliability and security. Towards this end, great progress has been made in developing deep reinforcement learning (DRL) based grid control solutions in recent years. However, existing DRL-based solutions have two main limitations: 1) they cannot handle well with a wide range of grid operation conditions, system parameters, and contingencies; 2) they generally lack the ability to fast adapt to new grid operation conditions, system parameters, and contingencies, limiting their applicability for real-world applications. In this paper, we mitigate these limitations by developing a novel deep meta reinforcement learning (DMRL) algorithm. The DMRL combines the meta strategy optimization together with DRL, and trains policies modulated by a latent space that can quickly adapt to new scenarios. We test the developed DMRL algorithm on the IEEE 300-bus system. We demonstrate fast adaptation of the meta-trained DRL polices with latent variables to new operating conditions and scenarios using the proposed method and achieve superior performance compared to the state-of-the-art DRL and model predictive control (MPC) methods.