No Arabic abstract
This paper develops a model-free volt-VAR optimization (VVO) algorithm via multi-agent deep reinforcement learning (MADRL) in unbalanced distribution systems. This method is novel since we cast the VVO problem in unbalanced distribution networks to an intelligent deep Q-network (DQN) framework, which avoids solving a specific optimization model directly when facing time-varying operating conditions of the systems. We consider statuses/ratios of switchable capacitors, voltage regulators, and smart inverters installed at distributed generators as the action variables of the DQN agents. A delicately designed reward function guides these agents to interact with the distribution system, in the direction of reinforcing voltage regulation and power loss reduction simultaneously. The forward-backward sweep method for radial three-phase distribution systems provides accurate power flow results within a few iterations to the DQN environment. Finally, the proposed multi-objective MADRL method realizes the dual goals for VVO. We test this algorithm on the unbalanced IEEE 13-bus and 123-bus systems. Numerical simulations validate the excellent performance of this method in voltage regulation and power loss reduction.
In an active power distribution system, Volt-VAR optimization (VVO) methods are employed to achieve network-level objectives such as minimization of network power losses. The commonly used model-based centralized and distributed VVO algorithms perform poorly in the absence of a communication system and with model and measurement uncertainties. In this paper, we proposed a model-free local Volt-VAR control approach for network-level optimization that does not require communication with other decision-making agents. The proposed algorithm is based on extremum-seeking approach that uses only local measurements to minimize the network power losses. To prove that the proposed extremum-seeking controller converges to the optimum solution, we also derive mathematical conditions for which the loss minimization problem is convex with respect to the control variables. Local controllers pose stability concerns during highly variable scenarios. Thus, the proposed extremum-seeking controller is integrated with an adaptive-droop control module to provide a stable local control response. The proposed approach is validated using IEEE 4-bus and IEEE 123-bus systems and achieves the loss minimization objective while maintaining the voltage within the pre-specific limits even during highly variable DER generation scenarios.
We introduce PowerGym, an open-source reinforcement learning environment for Volt-Var control in power distribution systems. Following OpenAI Gym APIs, PowerGym targets minimizing power loss and voltage violations under physical networked constraints. PowerGym provides four distribution systems (13Bus, 34Bus, 123Bus, and 8500Node) based on IEEE benchmark systems and design variants for various control difficulties. To foster generalization, PowerGym offers a detailed customization guide for users working with their distribution systems. As a demonstration, we examine state-of-the-art reinforcement learning algorithms in PowerGym and validate the environment by studying controller behaviors.
Load shedding has been one of the most widely used and effective emergency control approaches against voltage instability. With increased uncertainties and rapidly changing operational conditions in power systems, existing methods have outstanding issues in terms of either speed, adaptiveness, or scalability. Deep reinforcement learning (DRL) was regarded and adopted as a promising approach for fast and adaptive grid stability control in recent years. However, existing DRL algorithms show two outstanding issues when being applied to power system control problems: 1) computational inefficiency that requires extensive training and tuning time; and 2) poor scalability making it difficult to scale to high dimensional control problems. To overcome these issues, an accelerated DRL algorithm named PARS was developed and tailored for power system voltage stability control via load shedding. PARS features high scalability and is easy to tune with only five main hyperparameters. The method was tested on both the IEEE 39-bus and IEEE 300-bus systems, and the latter is by far the largest scale for such a study. Test results show that, compared to other methods including model-predictive control (MPC) and proximal policy optimization(PPO) methods, PARS shows better computational efficiency (faster convergence), more robustness in learning, excellent scalability and generalization capability.
In Volt/Var control (VVC) of active distribution networks(ADNs), both slow timescale discrete devices (STDDs) and fast timescale continuous devices (FTCDs) are involved. The STDDs such as on-load tap changers (OLTC) and FTCDs such as distributed generators should be coordinated in time sequence. Such VCC is formulated as a two-timescale optimization problem to jointly optimize FTCDs and STDDs in ADNs. Traditional optimization methods are heavily based on accurate models of the system, but sometimes impractical because of their unaffordable effort on modelling. In this paper, a novel bi-level off-policy reinforcement learning (RL) algorithm is proposed to solve this problem in a model-free manner. A Bi-level Markov decision process (BMDP) is defined to describe the two-timescale VVC problem and separate agents are set up for the slow and fast timescale sub-problems. For the fast timescale sub-problem, we adopt an off-policy RL method soft actor-critic with high sample efficiency. For the slow one, we develop an off-policy multi-discrete soft actor-critic (MDSAC) algorithm to address the curse of dimensionality with various STDDs. To mitigate the non-stationary issue existing the two agents learning processes, we propose a multi-timescale off-policy correction (MTOPC) method by adopting importance sampling technique. Comprehensive numerical studies not only demonstrate that the proposed method can achieve stable and satisfactory optimization of both STDDs and FTCDs without any model information, but also support that the proposed method outperforms existing two-timescale VVC methods.
At high latitudes, many cities adopt a centralized heating system to improve the energy generation efficiency and to reduce pollution. In multi-tier systems, so-called district heating, there are a few efficient approaches for the flow rate control during the heating process. In this paper, we describe the theoretical methods to solve this problem by deep reinforcement learning and propose a cloud-based heating control system for implementation. A real-world case study shows the effectiveness and practicability of the proposed system controlled by humans, and the simulated experiments for deep reinforcement learning show about 1985.01 gigajoules of heat quantity and 42276.45 tons of water are saved per hour compared with manual control.