ترغب بنشر مسار تعليمي؟ اضغط هنا

A Reinforcement Learning Method For Power Suppliers Strategic Bidding with Insufficient Information

74   0   0.0 ( 0 )
 نشر من قبل Qiangang Jia
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

Power suppliers can exercise market power to gain higher profit. However, this becomes difficult when external information is extremely rare. To get a promising performance in an extremely incomplete information market environment, a novel model-free reinforcement learning algorithm based on the Learning Automata (LA) is proposed in this paper. Besides, this paper analyses the rationality and convergence of the algorithm in case studies based on the Cournot market model.



قيم البحث

اقرأ أيضاً

Under voltage load shedding has been considered as a standard and effective measure to recover the voltage stability of the electric power grid under emergency and severe conditions. However, this scheme usually trips a massive amount of load which c an be unnecessary and harmful to customers. Recently, deep reinforcement learning (RL) has been regarded and adopted as a promising approach that can significantly reduce the amount of load shedding. However, like most existing machine learning (ML)-based control techniques, RL control usually cannot guarantee the safety of the systems under control. In this paper, we introduce a novel safe RL method for emergency load shedding of power systems, that can enhance the safe voltage recovery of the electric power grid after experiencing faults. Unlike the standard RL method, the safe RL method has a reward function consisting of a Barrier function that goes to minus infinity when the system state goes to the safety bounds. Consequently, the optimal control policy can render the power system to avoid the safety bounds. This method is general and can be applied to other safety-critical control problems. Numerical simulations on the 39-bus IEEE benchmark is performed to demonstrate the effectiveness of the proposed safe RL emergency control, as well as its adaptive capability to faults not seen in the training.
The cost of the power distribution infrastructures is driven by the peak power encountered in the system. Therefore, the distribution network operators consider billing consumers behind a common transformer in the function of their peak demand and le ave it to the consumers to manage their collective costs. This management problem is, however, not trivial. In this paper, we consider a multi-agent residential smart grid system, where each agent has local renewable energy production and energy storage, and all agents are connected to a local transformer. The objective is to develop an optimal policy that minimizes the economic cost consisting of both the spot-market cost for each consumer and their collective peak-power cost. We propose to use a parametric Model Predictive Control (MPC)-scheme to approximate the optimal policy. The optimality of this policy is limited by its finite horizon and inaccurate forecasts of the local power production-consumption. A Deterministic Policy Gradient (DPG) method is deployed to adjust the MPC parameters and improve the policy. Our simulations show that the proposed MPC-based Reinforcement Learning (RL) method can effectively decrease the long-term economic cost for this smart grid problem.
58 - Ren Hu , Qifeng Li 2021
The multi-period dynamics of energy storage (ES), intermittent renewable generation and uncontrollable power loads, make the optimization of power system operation (PSO) challenging. A multi-period optimal PSO under uncertainty is formulated using th e chance-constrained optimization (CCO) modeling paradigm, where the constraints include the nonlinear energy storage and AC power flow models. Based on the emerging scenario optimization method which does not rely on pre-known probability distribution functions, this paper develops a novel solution method for this challenging CCO problem. The proposed meth-od is computationally effective for mainly two reasons. First, the original AC power flow constraints are approximated by a set of learning-assisted quadratic convex inequalities based on a generalized least absolute shrinkage and selection operator. Second, considering the physical patterns of data and motived by learning-based sampling, the strategic sampling method is developed to significantly reduce the required number of scenarios through different sampling strategies. The simulation results on IEEE standard systems indicate that 1) the proposed strategic sampling significantly improves the computational efficiency of the scenario-based approach for solving the chance-constrained optimal PSO problem, 2) the data-driven convex approximation of power flow can be promising alternatives of nonlinear and nonconvex AC power flow.
This paper proposes a novel approach to estimate the steady-state angle stability limit (SSASL) by using the nonlinear power system dynamic model in the modal space. Through two linear changes of coordinates and a simplification introduced by the ste ady-state condition, the nonlinear power system dynamic model is transformed into a number of single-machine-like power systems whose power-angle curves can be derived and used for estimating the SSASL. The proposed approach estimates the SSASL of angles at all machines and all buses without the need for manually specifying the scenario, i.e. setting sink and source areas, and also without the need for solving multiple nonlinear power flows. Case studies on 9-bus and 39-bus power systems demonstrate that the proposed approach is always able to capture the aperiodic instability in an online environment, showing promising performance in the online monitoring of the steady-state angle stability over the traditional power flow-based analysis.
82 - Ali Baheri 2020
This paper presents a safe reinforcement learning system for automated driving that benefits from multimodal future trajectory predictions. We propose a safety system that consists of two safety components: a heuristic safety and a learning-based saf ety. The heuristic safety module is based on common driving rules. On the other hand, the learning-based safety module is a data-driven safety rule that learns safety patterns from driving data. Specifically, it utilizes mixture density recurrent neural networks (MD-RNN) for multimodal future trajectory predictions to accelerate the learning progress. Our simulation results demonstrate that the proposed safety system outperforms previously reported results in terms of average reward and number of collisions.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا