Distributed Deep Reinforcement Learning for Collaborative Spectrum Sharing

253 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Pranav M Pawar Dr

تاريخ النشر 2021

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Pranav M. Pawar - Amir Leshem

معالجة الإشارات نظرية المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Spectrum sharing among users is a fundamental problem in the management of any wireless network. In this paper, we discuss the problem of distributed spectrum collaboration without central management under general unknown channels. Since the cost of communication, coordination and control is rapidly increasing with the number of devices and the expanding bandwidth used there is an obvious need to develop distributed techniques for spectrum collaboration where no explicit signaling is used. In this paper, we combine game-theoretic insights with deep Q-learning to provide a novel asymptotically optimal solution to the spectrum collaboration problem. We propose a deterministic distributed deep reinforcement learning(D3RL) mechanism using a deep Q-network (DQN). It chooses the channels using the Q-values and the channel loads while limiting the options available to the user to a few channels with the highest Q-values and among those, it selects the least loaded channel. Using insights from both game theory and combinatorial optimization we show that this technique is asymptotically optimal for large overloaded networks. The selected channel and the outcome of the successful transmission are fed back into the learning of the deep Q-network to incorporate it into the learning of the Q-values. We also analyzed performance to understand the behavior of D3RL in differ

قيم البحث

137 - Wei Cui , Wei Yu 2020

This paper proposes a novel scalable reinforcement learning approach for simultaneous routing and spectrum access in wireless ad-hoc networks. In most previous works on reinforcement learning for network optimization, the network topology is assumed to be fixed, and a different agent is trained for each transmission node -- this limits scalability and generalizability. Further, routing and spectrum access are typically treated as separate tasks. Moreover, the optimization objective is usually a cumulative metric along the route, e.g., number of hops or delay. In this paper, we account for the physical-layer signal-to-interference-plus-noise ratio (SINR) in a wireless network and further show that bottleneck objective such as the minimum SINR along the route can also be optimized effectively using reinforcement learning. Specifically, we propose a scalable approach in which a single agent is associated with each flow and makes routing and spectrum access decisions as it moves along the frontier nodes. The agent is trained according to the physical-layer characteristics of the environment using a novel rewarding scheme based on the Monte Carlo estimation of the future bottleneck SINR. It learns to avoid interference by intelligently making joint routing and spectrum allocation decisions based on the geographical location information of the neighbouring nodes.

معالجة الإشارات الذكاء الاصطناعي التعلم الآلي

Reinforcement Learning for Efficient and Tuning-Free Link Adaptation

132 - Vidit Saxena , Hugo Tullberg , 2020

Wireless links adapt the data transmission parameters to the dynamic channel state -- this is called link adaptation. Classical link adaptation relies on tuning parameters that are challenging to configure for optimal link performance. Recently, rein forcement learning has been proposed to automate link adaptation, where the transmission parameters are modeled as discrete arms of a multi-armed bandit. In this context, we propose a latent learning model for link adaptation that exploits the correlation between data transmission parameters. Further, motivated by the recent success of Thompson sampling for multi-armed bandit problems, we propose a latent Thompson sampling (LTS) algorithm that quickly learns the optimal parameters for a given channel state. We extend LTS to fading wireless channels through a tuning-free mechanism that automatically tracks the channel dynamics. In numerical evaluations with fading wireless channels, LTS improves the link throughout by up to 100% compared to the state-of-the-art link adaptation algorithms.

معالجة الإشارات نظرية المعلومات التعلم الآلي

A Deep Reinforcement Learning Approach for Traffic Signal Control Optimization

103 - Zhenning Li , Chengzhong Xu , Guohui Zhang 2021

Inefficient traffic signal control methods may cause numerous problems, such as traffic congestion and waste of energy. Reinforcement learning (RL) is a trending data-driven approach for adaptive traffic signal control in complex urban traffic networ ks. Although the development of deep neural networks (DNN) further enhances its learning capability, there are still some challenges in applying deep RLs to transportation networks with multiple signalized intersections, including non-stationarity environment, exploration-exploitation dilemma, multi-agent training schemes, continuous action spaces, etc. In order to address these issues, this paper first proposes a multi-agent deep deterministic policy gradient (MADDPG) method by extending the actor-critic policy gradient algorithms. MADDPG has a centralized learning and decentralized execution paradigm in which critics use additional information to streamline the training process, while actors act on their own local observations. The model is evaluated via simulation on the Simulation of Urban MObility (SUMO) platform. Model comparison results show the efficiency of the proposed algorithm in controlling traffic lights.

معالجة الإشارات الذكاء الاصطناعي التعلم الآلي

A Markovian Model-Driven Deep Learning Framework for Massive MIMO CSI Feedback

63 - Zhenyu Liu , Mason del Rosario , 2020

Forward channel state information (CSI) often plays a vital role in scheduling and capacity-approaching transmission optimization for massive multiple-input multiple-output (MIMO) communication systems. In frequency division duplex (FDD) massive MIMO systems, forwardlink CSI reconstruction at the transmitter relies critically on CSI feedback from receiving nodes and must carefully weigh the tradeoff between reconstruction accuracy and feedback bandwidth. Recent studies on the use of recurrent neural networks (RNNs) have demonstrated strong promises, though the cost of computation and memory remains high, for massive MIMO deployment. In this work, we exploit channel coherence in time to substantially improve the feedback efficiency. Using a Markovian model, we develop a deep convolutional neural network (CNN)-based framework MarkovNet to differentially encode forward CSI in time to effectively improve reconstruction accuracy. Furthermore, we explore important physical insights, including spherical normalization of input data and convolutional layers for feedback compression. We demonstrate substantial performance improvement and complexity reduction over the RNN-based work by our proposed MarkovNet to recover forward CSI estimates accurately. We explore additional practical consideration in feedback quantization, and show that MarkovNet outperforms RNN-based CSI estimation networks at a fraction of the computational cost.

معالجة الإشارات نظرية المعلومات التعلم الآلي

Polar Decoding on Sparse Graphs with Deep Learning

96 - Weihong Xu 2018

In this paper, we present a sparse neural network decoder (SNND) of polar codes based on belief propagation (BP) and deep learning. At first, the conventional factor graph of polar BP decoding is converted to the bipartite Tanner graph similar to low -density parity-check (LDPC) codes. Then the Tanner graph is unfolded and translated into the graphical representation of deep neural network (DNN). The complex sum-product algorithm (SPA) is modified to min-sum (MS) approximation with low complexity. We dramatically reduce the number of weight by using single weight to parameterize the networks. Optimized by the training techniques of deep learning, proposed SNND achieves comparative decoding performance of SPA and obtains about $0.5$ dB gain over MS decoding on ($128,64$) and ($256,128$) codes. Moreover, $60 %$ complexity reduction is achieved and the decoding latency is significantly lower than the conventional polar BP.

معالجة الإشارات نظرية المعلومات التعلم الآلي