Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks

146 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xiang Tan

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xiang Tan - Li Zhou - Haijun Wang

بنية الشبكات والإنترنت الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

With the development of the 5G and Internet of Things, amounts of wireless devices need to share the limited spectrum resources. Dynamic spectrum access (DSA) is a promising paradigm to remedy the problem of inefficient spectrum utilization brought upon by the historical command-and-control approach to spectrum allocation. In this paper, we investigate the distributed DSA problem for multi-user in a typical multi-channel cognitive radio network. The problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and we proposed a centralized off-line training and distributed on-line execution framework based on cooperative multi-agent reinforcement learning (MARL). We employ the deep recurrent Q-network (DRQN) to address the partial observability of the state for each cognitive user. The ultimate goal is to learn a cooperative strategy which maximizes the sum throughput of cognitive radio network in distributed fashion without coordination information exchange between cognitive users. Finally, we validate the proposed algorithm in various settings through extensive experiments. From the simulation results, we can observe that the proposed algorithm can converge fast and achieve almost the optimal performance.

قيم البحث

84 - Weiheng Jiang , Wanxin Yu 2021

Designing clustered unmanned aerial vehicle (UAV) communication networks based on cognitive radio (CR) and reinforcement learning can significantly improve the intelligence level of clustered UAV communication networks and the robustness of the syste m in a time-varying environment. Among them, designing smarter systems for spectrum sensing and access is a key research issue in CR. Therefore, we focus on the dynamic cooperative spectrum sensing and channel access in clustered cognitive UAV (CUAV) communication networks. Due to the lack of prior statistical information on the primary user (PU) channel occupancy state, we propose to use multi-agent reinforcement learning (MARL) to model CUAV spectrum competition and cooperative decision-making problem in this dynamic scenario, and a return function based on the weighted compound of sensing-transmission cost and utility is introduced to characterize the real-time rewards of multi-agent game. On this basis, a time slot multi-round revisit exhaustive search algorithm based on virtual controller (VC-EXH), a Q-learning algorithm based on independent learner (IL-Q) and a deep Q-learning algorithm based on independent learner (IL-DQN) are respectively proposed. Further, the information exchange overhead, execution complexity and convergence of the three algorithms are briefly analyzed. Through the numerical simulation analysis, all three algorithms can converge quickly, significantly improve system performance and increase the utilization of idle spectrum resources.

بنية الشبكات والإنترنت

Performance Comparison of Cooperative and Distributed Spectrum Sensing in Cognitive Radio

376 - Zheng Sun , Wenjun Xu , Zhiqiang He 2008

In this paper, we compare the performances of cooperative and distributed spectrum sensing in wireless sensor networks. After introducing the basic problem, we describe two strategies: 1) a cooperative sensing strategy, which takes advantage of coope ration diversity gain to increase probability of detection and 2) a distributed sensing strategy, which by passing the results in an inter-node manner increases energy efficiency and fairness among nodes. Then, we compare the performances of the strategies in terms of three criteria: agility, energy efficiency, and robustness against SNR changes, and summarize the comparison. It shows that: 1) the non-cooperative strategy has the best fairness of energy consumption, 2) the cooperative strategy leads to the best agility, and 3) the distributed strategy leads to the lowest energy consumption and the best robustness against SNR changes.

بنية الشبكات والإنترنت

Matching-based Spectrum Allocation in Cognitive Radio Networks

98 - Raghed El-Bardan , Walid Saad , Swastik Brahma 2015

In this paper, a novel spectrum association approach for cognitive radio networks (CRNs) is proposed. Based on a measure of both inference and confidence as well as on a measure of quality-of-service, the association between secondary users (SUs) in the network and frequency bands licensed to primary users (PUs) is investigated. The problem is formulated as a matching game between SUs and PUs. In this game, SUs employ a soft-decision Bayesian framework to detect PUs signals and, eventually, rank them based on the logarithm of the a posteriori ratio. A performance measure that captures both the ranking metric and rate is further computed by the SUs. Using this performance measure, a PU evaluates its own utility function that it uses to build its own association preferences. A distributed algorithm that allows both SUs and PUs to interact and self-organize into a stable match is proposed. Simulation results show that the proposed algorithm can improve the sum of SUs rates by up to 20 % and 60 % relative to the deferred acceptance algorithm and random channel allocation approach, respectively. The results also show an improved convergence time.

بنية الشبكات والإنترنت علوم الكمبيوتر ونظرية الألعاب نظرية المعلومات

Multi-Agent Common Knowledge Reinforcement Learning

193 - Christian A. Schroeder de Witt , Jakob N. Foerster , Gregory Farquhar 2018

Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can reconstruct parts of each others observations. Since agents an independently agree on their common knowledge, they can execute complex coordinated policies that condition on this knowledge in a fully decentralised fashion. We propose multi-agent common knowledge reinforcement learning (MACKRL), a novel stochastic actor-critic algorithm that learns a hierarchical policy tree. Higher levels in the hierarchy coordinate groups of agents by conditioning on their common knowledge, or delegate to lower levels with smaller subgroups but potentially richer common knowledge. The entire policy tree can be executed in a fully decentralised fashion. As the lowest policy tree level consists of independent policies for each agent, MACKRL reduces to independently learnt decentralised policies as a special case. We demonstrate that our method can exploit common knowledge for superior performance on complex decentralised coordination tasks, including a stochastic matrix game and challenging problems in StarCraft II unit micromanagement.

أنظمة متعددة العملاء الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

90 - Joel Z. Leibo , Vinicius Zambaldi , Marc Lanctot 2017

Matrix games like Prisoners Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Coope rativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

أنظمة متعددة العملاء الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب