No Arabic abstract
The paper presents a reinforcement learning solution to dynamic resource allocation for 5G radio access network slicing. Available communication resources (frequency-time blocks and transmit powers) and computational resources (processor usage) are allocated to stochastic arrivals of network slice requests. Each request arrives with priority (weight), throughput, computational resource, and latency (deadline) requirements, and if feasible, it is served with available communication and computational resources allocated over its requested duration. As each decision of resource allocation makes some of the resources temporarily unavailable for future, the myopic solution that can optimize only the current resource allocation becomes ineffective for network slicing. Therefore, a Q-learning solution is presented to maximize the network utility in terms of the total weight of granted network slicing requests over a time horizon subject to communication and computational constraints. Results show that reinforcement learning provides major improvements in the 5G network utility relative to myopic, random, and first come first served solutions. While reinforcement learning sustains scalable performance as the number of served users increases, it can also be effectively used to assign resources to network slices when 5G needs to share the spectrum with incumbent users that may dynamically occupy some of the frequency-time blocks.
Reinforcement learning (RL) for network slicing is considered in the 5G radio access network, where the base station, gNodeB, allocates resource blocks (RBs) to the requests of user equipments and maximizes the total reward of accepted requests over time. Based on adversarial machine learning, a novel over-the-air attack is introduced to manipulate the RL algorithm and disrupt 5G network slicing. Subject to an energy budget, the adversary observes the spectrum and builds its own RL-based surrogate model that selects which RBs to jam with the objective of maximizing the number of failed network slicing requests due to jammed RBs. By jamming the RBs, the adversary reduces the RL algorithms reward. As this reward is used as the input to update the RL algorithm, the performance does not recover even after the adversary stops jamming. This attack is evaluated in terms of the recovery time and the (maximum and total) reward loss, and it is shown to be much more effective than benchmark (random and myopic) jamming attacks. Different reactive and proactive defense mechanisms (protecting the RL algorithms updates or misleading the adversarys learning process) are introduced to show that it is viable to defend 5G network slicing against this attack.
Network slicing is born as an emerging business to operators, by allowing them to sell the customized slices to various tenants at different prices. In order to provide better-performing and cost-efficient services, network slicing involves challenging technical issues and urgently looks forward to intelligent innovations to make the resource management consistent with users activities per slice. In that regard, deep reinforcement learning (DRL), which focuses on how to interact with the environment by trying alternative actions and reinforcing the tendency actions producing more rewarding consequences, is assumed to be a promising solution. In this paper, after briefly reviewing the fundamental concepts of DRL, we investigate the application of DRL in solving some typical resource management for network slicing scenarios, which include radio resource slicing and priority-based core network slicing, and demonstrate the advantage of DRL over several competing schemes through extensive simulations. Finally, we also discuss the possible challenges to apply DRL in network slicing from a general perspective.
Radio access network (RAN) slicing is an important part of network slicing in 5G. The evolving network architecture requires the orchestration of multiple network resources such as radio and cache resources. In recent years, machine learning (ML) techniques have been widely applied for network slicing. However, most existing works do not take advantage of the knowledge transfer capability in ML. In this paper, we propose a transfer reinforcement learning (TRL) scheme for joint radio and cache resources allocation to serve 5G RAN slicing.We first define a hierarchical architecture for the joint resources allocation. Then we propose two TRL algorithms: Q-value transfer reinforcement learning (QTRL) and action selection transfer reinforcement learning (ASTRL). In the proposed schemes, learner agents utilize the expert agents knowledge to improve their performance on target tasks. The proposed algorithms are compared with both the model-free Q-learning and the model-based priority proportional fairness and time-to-live (PPF-TTL) algorithms. Compared with Q-learning, QTRL and ASTRL present 23.9% lower delay for Ultra Reliable Low Latency Communications slice and 41.6% higher throughput for enhanced Mobile Broad Band slice, while achieving significantly faster convergence than Q-learning. Moreover, 40.3% lower URLLC delay and almost twice eMBB throughput are observed with respect to PPF-TTL.
5G is regarded as a revolutionary mobile network, which is expected to satisfy a vast number of novel services, ranging from remote health care to smart cities. However, heterogeneous Quality of Service (QoS) requirements of different services and limited spectrum make the radio resource allocation a challenging problem in 5G. In this paper, we propose a multi-agent reinforcement learning (MARL) method for radio resource slicing in 5G. We model each slice as an intelligent agent that competes for limited radio resources, and the correlated Q-learning is applied for inter-slice resource block (RB) allocation. The proposed correlated Q-learning based interslice RB allocation (COQRA) scheme is compared with Nash Q-learning (NQL), Latency-Reliability-Throughput Q-learning (LRTQ) methods, and the priority proportional fairness (PPF) algorithm. Our simulation results show that the proposed COQRA achieves 32.4% lower latency and 6.3% higher throughput when compared with LRTQ, and 5.8% lower latency and 5.9% higher throughput than NQL. Significantly higher throughput and lower packet drop rate (PDR) is observed in comparison to PPF.
To optimally cover users in millimeter-Wave (mmWave) networks, clustering is needed to identify the number and direction of beams. The mobility of users motivates the need for an online clustering scheme to maintain up-to-date beams towards those clusters. Furthermore, mobility of users leads to varying patterns of clusters (i.e., users move from the coverage of one beam to another), causing dynamic traffic load per beam. As such, efficient radio resource allocation and beam management is needed to address the dynamicity that arises from mobility of users and their traffic. In this paper, we consider the coexistence of Ultra-Reliable Low-Latency Communication (URLLC) and enhanced Mobile BroadBand (eMBB) users in 5G mmWave networks and propose a Quality-of-Service (QoS) aware clustering and resource allocation scheme. Specifically, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is used for online clustering of users and the selection of the number of beams. In addition, Long Short Term Memory (LSTM)-based Deep Reinforcement Learning (DRL) scheme is used for resource block allocation. The performance of the proposed scheme is compared to a baseline that uses K-means and priority-based proportional fairness for clustering and resource allocation, respectively. Our simulation results show that the proposed scheme outperforms the baseline algorithm in terms of latency, reliability, and rate of URLLC users as well as rate of eMBB users.