RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems

233 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Bai Liu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Bai Liu - Qiaomin Xie - Eytan Modiano

الأداء التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called Reinforcement Learning for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space, while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.

قيم البحث

117 - Ramkumar Raghu , Pratheek Upadhyaya , Mahadesh Panju 2019

We consider a multicast scheme recently proposed for a wireless downlink in [1]. It was shown earlier that power control can significantly improve its performance. However for this system, obtaining optimal power control is intractable because of a v ery large state space. Therefore in this paper we use deep reinforcement learning where we use function approximation of the Q-function via a deep neural network. We show that optimal power control can be learnt for reasonably large systems via this approach. The average power constraint is ensured via a Lagrange multiplier, which is also learnt. Finally, we demonstrate that a slight modification of the learning algorithm allows the optimal control to track the time varying system statistics.

بنية الشبكات والإنترنت التعلم الآلي التعلم الالي

Zero Queueing for Multi-Server Jobs

103 - Weina Wang , Qiaomin Xie , Mor Harchol-Balter 2020

Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-job- per-server model: an arrival might not fit into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem. In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems.

الأداء الاحتمالات

Deep reinforcement learning for scheduling in large-scale networked control systems

98 - Adrian Redder , Arunselvan Ramaswamy , Daniel E. Quevedo 2019

This work considers the problem of control and resource scheduling in networked systems. We present DIRA, a Deep reinforcement learning based Iterative Resource Allocation algorithm, which is scalable and control-aware. Our algorithm is tailored towa rds large-scale problems where control and scheduling need to act jointly to optimize performance. DIRA can be used to schedule general time-domain optimization based controllers. In the present work, we focus on control designs based on suitably adapted linear quadratic regulators. We apply our algorithm to networked systems with correlated fading communication channels. Our simulations show that DIRA scales well to large scheduling problems.

أنظمة وتحكم التعلم الآلي التحسين والتحكم

CFR-RL: Traffic Engineering with Reinforcement Learning in SDN

80 - Junjie Zhang , Minghao Ye , Zehua Guo 2020

Traditional Traffic Engineering (TE) solutions can achieve the optimal or near-optimal performance by rerouting as many flows as possible. However, they do not usually consider the negative impact, such as packet out of order, when frequently rerouti ng flows in the network. To mitigate the impact of network disturbance, one promising TE solution is forwarding the majority of traffic flows using Equal-Cost Multi-Path (ECMP) and selectively rerouting a few critical flows using Software-Defined Networking (SDN) to balance link utilization of the network. However, critical flow rerouting is not trivial because the solution space for critical flow selection is enormous. Moreover, it is impossible to design a heuristic algorithm for this problem based on fixed and simple rules, since rule-based heuristics are unable to adapt to the changes of the traffic matrix and network dynamics. In this paper, we propose CFR-RL (Critical Flow Rerouting-Reinforcement Learning), a Reinforcement Learning-based scheme that learns a policy to select critical flows for each given traffic matrix automatically. CFR-RL then reroutes these selected critical flows to balance link utilization of the network by formulating and solving a simple Linear Programming (LP) problem. Extensive evaluations show that CFR-RL achieves near-optimal performance by rerouting only 10%-21.3% of total traffic.

بنية الشبكات والإنترنت التعلم الآلي

Quantum optimal control of multi-level dissipative quantum systems with Reinforcement Learning

80 - Zheng An , Qi-Kai He , Hai-Jing Song 2020

Manipulate and control of the complex quantum system with high precision are essential for achieving universal fault tolerant quantum computing. For a physical system with restricted control resources, it is a challenge to control the dynamics of the target system efficiently and precisely under disturbances. Here we propose a multi-level dissipative quantum control framework and show that deep reinforcement learning provides an efficient way to identify the optimal strategies with restricted control parameters of the complex quantum system. This framework can be generalized to be applied to other quantum control models. Compared with the traditional optimal control method, this deep reinforcement learning algorithm can realize efficient and precise control for multi-level quantum systems with different types of disturbances.

فيزياء الكم