No Arabic abstract
Due to the growing volume of data traffic produced by the surge of Internet of Things (IoT) devices, the demand for radio spectrum resources is approaching their limitation defined by Federal Communications Commission (FCC). To this end, Dynamic Spectrum Access (DSA) is considered as a promising technology to handle this spectrum scarcity. However, standard DSA techniques often rely on analytical modeling wireless networks, making its application intractable in under-measured network environments. Therefore, utilizing neural networks to approximate the network dynamics is an alternative approach. In this article, we introduce a Federated Learning (FL) based framework for the task of DSA, where FL is a distributive machine learning framework that can reserve the privacy of network terminals under heterogeneous data distributions. We discuss the opportunities, challenges, and opening problems of this framework. To evaluate its feasibility, we implement a Multi-Agent Reinforcement Learning (MARL)-based FL as a realization associated with its initial evaluation results.
In this paper, we study partially overlapping co-existence scenarios in cognitive radio environment. We consider an Orthogonal Frequency Division Multiplexing (OFDM) cognitive system coexisting with a narrow-band (NB) and an OFDM primary system, respectively. We focus on finding the minimum frequency separation between the coexisting systems to meet a certain target BER. Windowing and nulling are used as simple techniques to reduce the OFDM out-of-band radiations, and, hence decrease the separation. The effect of these techniques on the OFDM spectral efficiency and PAPR is also studied.
A stochastic multi-user multi-armed bandit framework is used to develop algorithms for uncoordinated spectrum access. In contrast to prior work, it is assumed that rewards can be non-zero even under collisions, thus allowing for the number of users to be greater than the number of channels. The proposed algorithm consists of an estimation phase and an allocation phase. It is shown that if every user adopts the algorithm, the system wide regret is order-optimal of order $O(log T)$ over a time-horizon of duration $T$. The regret guarantees hold for both the cases where the number of users is greater than or less than the number of channels. The algorithm is extended to the dynamic case where the number of users in the system evolves over time, and is shown to lead to sub-linear regret.
An opportunistic spectrum access (OSA) for the infrastructure-less (or cognitive ad-hoc) network has received significant attention thanks to emerging paradigms such as the Internet of Things (IoTs) and smart grids. Research in this area has evolved from the r{ho}rand algorithm requiring prior knowledge of the number of active secondary users (SUs) to the musical chair (MC) algorithm where the number of SUs are unknown and estimated independently at each SU. These works ignore the number of collisions in the network leading to wastage of power and bring down the effective life of battery operated SUs. In this paper, we develop algorithms for OSA that learn faster and incurs fewer number of collisions i.e. energy efficient. We consider two types of infrastructure-less decentralized networks: 1) static network where the number of SUs are fixed but unknown, and 2) dynamic network where SUs can independently enter or leave the network. We set up the problem as a multi-player mult-armed bandit and develop two distributed algorithms. The analysis shows that when all the SUs independently implement the proposed algorithms, the loss in throughput compared to the optimal throughput, i.e. regret, is a constant with high probability and significantly outperforms existing algorithms both in terms of regret and number of collisions. Fewer collisions make them ideally suitable for battery operated SU terminals. We validate our claims through exhaustive simulated experiments as well as through a realistic USRP based experiments in a real radio environment.
In this paper, the problem of opportunistic spectrum sharing for the next generation of wireless systems empowered by the cloud radio access network (C-RAN) is studied. More precisely, low-priority users employ cooperative spectrum sensing to detect a vacant portion of the spectrum that is not currently used by high-priority users. The design of the scheme is to maximize the overall throughput of the low-priority users while guaranteeing the quality of service of the high-priority users. This objective is attained by optimally adjusting spectrum sensing time with respect to imposed target probabilities of detection and false alarm as well as dynamically allocating and assigning C-RAN resources, i.e., transmit powers, sub-carriers, remote radio heads (RRHs), and base-band units. The presented optimization problem is non-convex and NP-hard that is extremely hard to tackle directly. To solve the problem, a low-complex iterative approach is proposed in which sensing time, user association parameters and transmit powers of RRHs are alternatively assigned and optimized at every step. Numerical results are then provided to demonstrate the necessity of performing sensing time adjustment in such systems as well as balancing the sensing-throughput tradeoff.
This paper proposes a novel scalable reinforcement learning approach for simultaneous routing and spectrum access in wireless ad-hoc networks. In most previous works on reinforcement learning for network optimization, the network topology is assumed to be fixed, and a different agent is trained for each transmission node -- this limits scalability and generalizability. Further, routing and spectrum access are typically treated as separate tasks. Moreover, the optimization objective is usually a cumulative metric along the route, e.g., number of hops or delay. In this paper, we account for the physical-layer signal-to-interference-plus-noise ratio (SINR) in a wireless network and further show that bottleneck objective such as the minimum SINR along the route can also be optimized effectively using reinforcement learning. Specifically, we propose a scalable approach in which a single agent is associated with each flow and makes routing and spectrum access decisions as it moves along the frontier nodes. The agent is trained according to the physical-layer characteristics of the environment using a novel rewarding scheme based on the Monte Carlo estimation of the future bottleneck SINR. It learns to avoid interference by intelligently making joint routing and spectrum allocation decisions based on the geographical location information of the neighbouring nodes.