ترغب بنشر مسار تعليمي؟ اضغط هنا

Personalized Demand Response via Shape-Constrained Online Learning

54   0   0.0 ( 0 )
 نشر من قبل Ana Ospina
 تاريخ النشر 2020
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper formalizes a demand response task as an optimization problem featuring a known time-varying engineering cost and an unknown (dis)comfort function. Based on this model, this paper develops a feedback-based projected gradient method to solve the demand response problem in an online fashion, where: i) feedback from the user is leveraged to learn the (dis)comfort function concurrently with the execution of the algorithm; and, ii) measurements of electrical quantities are used to estimate the gradient of the known engineering cost. To learn the unknown function, a shape-constrained Gaussian Process is leveraged; this approach allows one to obtain an estimated function that is strongly convex and smooth. The performance of the online algorithm is analyzed by using metrics such as the tracking error and the dynamic regret. A numerical example is illustrated to corroborate the technical findings.



قيم البحث

اقرأ أيضاً

This paper presents a distributed optimization algorithm tailored for solving optimal control problems arising in multi-building coordination. The buildings coordinated by a grid operator, join a demand response program to balance the voltage surge b y using an energy cost defined criterion. In order to model the hierarchical structure of the building network, we formulate a distributed convex optimization problem with separable objectives and coupled affine equality constraints. A variant of the Augmented Lagrangian based Alternating Direction Inexact Newton (ALADIN) method for solving the considered class of problems is then presented along with a convergence guarantee. To illustrate the effectiveness of the proposed method, we compare it to the Alternating Direction Method of Multipliers (ADMM) by running both an ALADIN and an ADMM based model predictive controller on a benchmark case study.
51 - Lixing Chen , Jie Xu 2019
Shared edge computing platforms, which enable Application Service Providers (ASPs) to deploy applications in close proximity to mobile users are providing ultra-low latency and location-awareness to a rich portfolio of services. Though ubiquitous edg e service provisioning, i.e., deploying the application at all possible edge sites, is always preferable, it is impractical due to often limited operational budget of ASPs. In this case, an ASP has to cautiously decide where to deploy the edge service and how much budget it is willing to use. A central issue here is that the service demand received by each edge site, which is the key factor of deploying benefit, is unknown to ASPs a priori. Whats more complicated is that this demand pattern varies temporally and spatially across geographically distributed edge sites. In this paper, we investigate an edge resource rental problem where the ASP learns service demand patterns for individual edge sites while renting computation resource at these sites to host its applications for edge service provisioning. An online algorithm, called Context-aware Online Edge Resource Rental (COERR), is proposed based on the framework of Contextual Combinatorial Multi-armed Bandit (CC-MAB). COERR observes side-information (context) to learn the demand patterns of edge sites and decides rental decisions (including where to rent the computation resource and how much to rent) to maximize ASPs utility given a limited budget. COERR provides a provable performance achieving sublinear regret compared to an Oracle algorithm that knows exactly the expected service demand of edge sites. Experiments are carried out on a real-world dataset and the results show that COERR significantly outperforms other benchmarks.
We develop an optimization model and corresponding algorithm for the management of a demand-side platform (DSP), whereby the DSP aims to maximize its own profit while acquiring valuable impressions for its advertiser clients. We formulate the problem of profit maximization for a DSP interacting with ad exchanges in a real-time bidding environment in a cost-per-click/cost-per-action pricing model. Our proposed formulation leads to a nonconvex optimization problem due to the joint optimization over both impression allocation and bid price decisions. We use Lagrangian relaxation to develop a tractable convex dual problem, which, due to the properties of second-price auctions, may be solved efficiently with subgradient methods. We propose a two-phase solution procedure, whereby in the first phase we solve the convex dual problem using a subgradient algorithm, and in the second phase we use the previously computed dual solution to set bid prices and then solve a linear optimization problem to obtain the allocation probability variables. On several synthetic examples, we demonstrate that our proposed solution approach leads to superior performance over a baseline method that is used in practice.
The prevalence of e-commerce has made detailed customers personal information readily accessible to retailers, and this information has been widely used in pricing decisions. When involving personalized information, how to protect the privacy of such information becomes a critical issue in practice. In this paper, we consider a dynamic pricing problem over $T$ time periods with an emph{unknown} demand function of posted price and personalized information. At each time $t$, the retailer observes an arriving customers personal information and offers a price. The customer then makes the purchase decision, which will be utilized by the retailer to learn the underlying demand function. There is potentially a serious privacy concern during this process: a third party agent might infer the personalized information and purchase decisions from price changes from the pricing system. Using the fundamental framework of differential privacy from computer science, we develop a privacy-preserving dynamic pricing policy, which tries to maximize the retailer revenue while avoiding information leakage of individual customers information and purchasing decisions. To this end, we first introduce a notion of emph{anticipating} $(varepsilon, delta)$-differential privacy that is tailored to dynamic pricing problem. Our policy achieves both the privacy guarantee and the performance guarantee in terms of regret. Roughly speaking, for $d$-dimensional personalized information, our algorithm achieves the expected regret at the order of $tilde{O}(varepsilon^{-1} sqrt{d^3 T})$, when the customers information is adversarially chosen. For stochastic personalized information, the regret bound can be further improved to $tilde{O}(sqrt{d^2T} + varepsilon^{-2} d^2)$
Our team is proposing to run a full-scale energy demand response experiment in an office building. Although this is an exciting endeavor which will provide value to the community, collecting training data for the reinforcement learning agent is costl y and will be limited. In this work, we examine how offline training can be leveraged to minimize data costs (accelerate convergence) and program implementation costs. We present two approaches to doing so: pretraining our model to warm start the experiment with simulated tasks, and using a planning model trained to simulate the real worlds rewards to the agent. We present results that demonstrate the utility of offline reinforcement learning to efficient price-setting in the energy demand response problem.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا