No Arabic abstract
We consider the problem of demand-side energy management, where each household is equipped with a smart meter that is able to schedule home appliances online. The goal is to minimise the overall cost under a real-time pricing scheme. While previous works have introduced centralised approaches, we formulate the smart grid environment as a Markov game, where each household is a decentralised agent, and the grid operator produces a price signal that adapts to the energy demand. The main challenge addressed in our approach is partial observability and perceived non-stationarity of the environment from the viewpoint of each agent. We propose a multi-agent extension of a deep actor-critic algorithm that shows success in learning in this environment. This algorithm learns a centralised critic that coordinates training of all agents. Our approach thus uses centralised learning but decentralised execution. Simulation results show that our online deep reinforcement learning method can reduce both the peak-to-average ratio of total energy consumed and the cost of electricity for all households based purely on instantaneous observations and a price signal.
Electricity is one of the mandatory commodities for mankind today. To address challenges and issues in the transmission of electricity through the traditional grid, the concepts of smart grids and demand response have been developed. In such systems, a large amount of data is generated daily from various sources such as power generation (e.g., wind turbines), transmission and distribution (microgrids and fault detectors), load management (smart meters and smart electric appliances). Thanks to recent advancements in big data and computing technologies, Deep Learning (DL) can be leveraged to learn the patterns from the generated data and predict the demand for electricity and peak hours. Motivated by the advantages of deep learning in smart grids, this paper sets to provide a comprehensive survey on the application of DL for intelligent smart grids and demand response. Firstly, we present the fundamental of DL, smart grids, demand response, and the motivation behind the use of DL. Secondly, we review the state-of-the-art applications of DL in smart grids and demand response, including electric load forecasting, state estimation, energy theft detection, energy sharing and trading. Furthermore, we illustrate the practicality of DL via various use cases and projects. Finally, we highlight the challenges presented in existing research works and highlight important issues and potential directions in the use of DL for smart grids and demand response.
Both single-agent and multi-agent actor-critic algorithms are an important class of Reinforcement Learning algorithms. In this work, we propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms. The agents objective is to collectively learn a joint policy that maximizes the sum of averaged long-term returns of these agents. In the absence of a central controller, agents communicate the information to their neighbors via a time-varying communication network while preserving privacy. We prove the convergence of all the 3 MAN algorithms to a globally asymptotically stable point of the ODE corresponding to the actor update; these use linear function approximations. We use the Fisher information matrix to obtain the natural gradients. The Fisher information matrix captures the curvature of the Kullback-Leibler (KL) divergence between polices at successive iterates. We also show that the gradient of this KL divergence between policies of successive iterates is proportional to the objective functions gradient. Our MAN algorithms indeed use this emph{representation} of the objective functions gradient. Under certain conditions on the Fisher information matrix, we prove that at each iterate, the optimal value via MAN algorithms can be better than that of the multi-agent actor-critic (MAAC) algorithm using the standard gradients. To validate the usefulness of our proposed algorithms, we implement all the 3 MAN algorithms on a bi-lane traffic network to reduce the average network congestion. We observe an almost 25% reduction in the average congestion in 2 MAN algorithms; the average congestion in another MAN algorithm is on par with the MAAC algorithm. We also consider a generic 15 agent MARL; the performance of the MAN algorithms is again as good as the MAAC algorithm. We attribute the better performance of the MAN algorithms to their use of the above representation.
With the advances in the Internet of Things technology, electric vehicles (EVs) have become easier to schedule in daily life, which is reshaping the electric load curve. It is important to design efficient charging algorithms to mitigate the negative impact of EV charging on the power grid. This paper investigates an EV charging scheduling problem to reduce the charging cost while shaving the peak charging load, under unknown future information about EVs, such as arrival time, departure time, and charging demand. First, we formulate an EV charging problem to minimize the electricity bill of the EV fleet and study the EV charging problem in an online setting without knowing future information. We develop an actor-critic learning-based smart charging algorithm (SCA) to schedule the EV charging against the uncertainties in EV charging behaviors. The SCA learns an optimal EV charging strategy with continuous charging actions instead of discrete approximation of charging. We further develop a more computationally efficient customized actor-critic learning charging algorithm (CALC) by reducing the state dimension and thus improving the computational efficiency. Finally, simulation results show that our proposed SCA can reduce EVs expected cost by 24.03%, 21.49%, 13.80%, compared with the Eagerly Charging Algorithm, Online Charging Algorithm, RL-based Adaptive Energy Management Algorithm, respectively. CALC is more computationally efficient, and its performance is close to that of SCA with only a gap of 5.56% in the cost.
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.
Demand side management (DSM) is a key solution for reducing the peak-time power consumption in smart grids. To provide incentives for consumers to shift their consumption to off-peak times, the utility company charges consumers differential pricing for using power at different times of the day. Consumers take into account these differential prices when deciding when and how much power to consume daily. Importantly, while consumers enjoy lower billing costs when shifting their power usage to off-peak times, they also incur discomfort costs due to the altering of their power consumption patterns. Existing works propose stationary strategies for the myopic consumers to minimize their short-term billing and discomfort costs. In contrast, we model the interaction emerging among self-interested, foresighted consumers as a repeated energy scheduling game and prove that the stationary strategies are suboptimal in terms of long-term total billing and discomfort costs. Subsequently, we propose a novel framework for determining optimal nonstationary DSM strategies, in which consumers can choose different daily power consumption patterns depending on their preferences, routines, and needs. As a direct consequence of the nonstationary DSM policy, different subsets of consumers are allowed to use power in peak times at a low price. The subset of consumers that are selected daily to have their joint discomfort and billing costs minimized is determined based on the consumers power consumption preferences as well as on the past history of which consumers have shifted their usage previously. Importantly, we show that the proposed strategies are incentive-compatible. Simulations confirm that, given the same peak-to-average ratio, the proposed strategy can reduce the total cost (billing and discomfort costs) by up to 50% compared to existing DSM strategies.