Deep Reinforcement Learning with Discrete Normalized Advantage Functions for Resource Management in Network Slicing

324 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yuxiu Hua

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Chen Qi - Yuxiu Hua - Rongpeng Li

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Network slicing promises to provision diversified services with distinct requirements in one infrastructure. Deep reinforcement learning (e.g., deep $mathcal{Q}$-learning, DQL) is assumed to be an appropriate algorithm to solve the demand-aware inter-slice resource management issue in network slicing by regarding the varying demands and the allocated bandwidth as the environment state and the action, respectively. However, allocating bandwidth in a finer resolution usually implies larger action space, and unfortunately DQL fails to quickly converge in this case. In this paper, we introduce discrete normalized advantage functions (DNAF) into DQL, by separating the $mathcal{Q}$-value function as a state-value function term and an advantage term and exploiting a deterministic policy gradient descent (DPGD) algorithm to avoid the unnecessary calculation of $mathcal{Q}$-value for every state-action pair. Furthermore, as DPGD only works in continuous action space, we embed a k-nearest neighbor algorithm into DQL to quickly find a valid action in the discrete space nearest to the DPGD output. Finally, we verify the faster convergence of the DNAF-based DQL through extensive simulations.

قيم البحث

363 - Rongpeng Li , Zhifeng Zhao , Qi Sun 2018

Network slicing is born as an emerging business to operators, by allowing them to sell the customized slices to various tenants at different prices. In order to provide better-performing and cost-efficient services, network slicing involves challengi ng technical issues and urgently looks forward to intelligent innovations to make the resource management consistent with users activities per slice. In that regard, deep reinforcement learning (DRL), which focuses on how to interact with the environment by trying alternative actions and reinforcing the tendency actions producing more rewarding consequences, is assumed to be a promising solution. In this paper, after briefly reviewing the fundamental concepts of DRL, we investigate the application of DRL in solving some typical resource management for network slicing scenarios, which include radio resource slicing and priority-based core network slicing, and demonstrate the advantage of DRL over several competing schemes through extensive simulations. Finally, we also discuss the possible challenges to apply DRL in network slicing from a general perspective.

بنية الشبكات والإنترنت التعلم الآلي

GAN-powered Deep Distributional Reinforcement Learning for Resource Management in Network Slicing

251 - Yuxiu Hua , Rongpeng Li , Zhifeng Zhao 2019

Network slicing is a key technology in 5G communications system. Its purpose is to dynamically and efficiently allocate resources for diversified services with distinct requirements over a common underlying physical infrastructure. Therein, demand-aw are resource allocation is of significant importance to network slicing. In this paper, we consider a scenario that contains several slices in a radio access network with base stations that share the same physical resources (e.g., bandwidth or slots). We leverage deep reinforcement learning (DRL) to solve this problem by considering the varying service demands as the environment state and the allocated resources as the environment action. In order to reduce the effects of the annoying randomness and noise embedded in the received service level agreement (SLA) satisfaction ratio (SSR) and spectrum efficiency (SE), we primarily propose generative adversarial network-powered deep distributional Q network (GAN-DDQN) to learn the action-value distribution driven by minimizing the discrepancy between the estimated action-value distribution and the target action-value distribution. We put forward a reward-clipping mechanism to stabilize GAN-DDQN training against the effects of widely-spanning utility values. Moreover, we further develop Dueling GAN-DDQN, which uses a specially designed dueling generator, to learn the action-value distribution by estimating the state-value distribution and the action advantage function. Finally, we verify the performance of the proposed GAN-DDQN and Dueling GAN-DDQN algorithms through extensive simulations.

التعلم الآلي الذكاء الاصطناعي بنية الشبكات والإنترنت

Resource Management for Blockchain-enabled Federated Learning: A Deep Reinforcement Learning Approach

140 - Nguyen Quang Hieu , Tran The Anh , Nguyen Cong Luong 2020

Blockchain-enabled Federated Learning (BFL) enables mobile devices to collaboratively train neural network models required by a Machine Learning Model Owner (MLMO) while keeping data on the mobile devices. Then, the model updates are stored in the bl ockchain in a decentralized and reliable manner. However, the issue of BFL is that the mobile devices have energy and CPU constraints that may reduce the system lifetime and training efficiency. The other issue is that the training latency may increase due to the blockchain mining process. To address these issues, the MLMO needs to (i) decide how much data and energy that the mobile devices use for the training and (ii) determine the block generation rate to minimize the system latency, energy consumption, and incentive cost while achieving the target accuracy for the model. Under the uncertainty of the BFL environment, it is challenging for the MLMO to determine the optimal decisions. We propose to use the Deep Reinforcement Learning (DRL) to derive the optimal decisions for the MLMO.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية بنية الشبكات والإنترنت

Reinforcement Learning for Dynamic Resource Optimization in 5G Radio Access Network Slicing

160 - Yi Shi , Yalin E. Sagduyu , Tugba Erpek 2020

The paper presents a reinforcement learning solution to dynamic resource allocation for 5G radio access network slicing. Available communication resources (frequency-time blocks and transmit powers) and computational resources (processor usage) are a llocated to stochastic arrivals of network slice requests. Each request arrives with priority (weight), throughput, computational resource, and latency (deadline) requirements, and if feasible, it is served with available communication and computational resources allocated over its requested duration. As each decision of resource allocation makes some of the resources temporarily unavailable for future, the myopic solution that can optimize only the current resource allocation becomes ineffective for network slicing. Therefore, a Q-learning solution is presented to maximize the network utility in terms of the total weight of granted network slicing requests over a time horizon subject to communication and computational constraints. Results show that reinforcement learning provides major improvements in the 5G network utility relative to myopic, random, and first come first served solutions. While reinforcement learning sustains scalable performance as the number of served users increases, it can also be effectively used to assign resources to network slices when 5G needs to share the spectrum with incumbent users that may dynamically occupy some of the frequency-time blocks.

بنية الشبكات والإنترنت التعلم الآلي

Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture

183 - Pawel Ladosz , Eseoghene Ben-Iwhiwhu , Jeffery Dick 2019

This paper presents a new neural architecture that combines a modulated Hebbian network (MOHN) with DQN, which we call modulated Hebbian plus Q network architecture (MOHQA). The hypothesis is that such a combination allows MOHQA to solve difficult pa rtially observable Markov decision process (POMDP) problems which impair temporal difference (TD)-based RL algorithms such as DQN, as the TD error cannot be easily derived from observations. The key idea is to use a Hebbian network with bio-inspired neural traces in order to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low level features and control, while the MOHN contributes to the high-level decisions by associating rewards with past states and actions. Thus the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the MALMO environment show that the proposed algorithm improved DQNs results and even outperformed control tests with A2C, QRDQN+LSTM and REINFORCE algorithms on some POMDPs with confounding stimuli and sparse rewards.

التعلم الآلي التعلم الالي