An order optimal policy for exploiting idle spectrum in cognitive radio networks


Abstract in English

In this paper a spectrum sensing policy employing recency-based exploration is proposed for cognitive radio networks. We formulate the problem of finding a spectrum sensing policy for multi-band dynamic spectrum access as a stochastic restless multi-armed bandit problem with stationary unknown reward distributions. In cognitive radio networks the multi-armed bandit problem arises when deciding where in the radio spectrum to look for idle frequencies that could be efficiently exploited for data transmission. We consider two models for the dynamics of the frequency bands: 1) the independent model where the state of the band evolves randomly independently from the past and 2) the Gilbert-Elliot model, where the states evolve according to a 2-state Markov chain. It is shown that in these conditions the proposed sensing policy attains asymptotically logarithmic weak regret. The policy proposed in this paper is an index policy, in which the index of a frequency band is comprised of a sample mean term and a recency-based exploration bonus term. The sample mean promotes spectrum exploitation whereas the exploration bonus encourages for further exploration for idle bands providing high data rates. The proposed recency based approach readily allows constructing the exploration bonus such that it will grow the time interval between consecutive sensing time instants of a suboptimal band exponentially, which then leads to logarithmically increasing weak regret. Simulation results confirming logarithmic weak regret are presented and it is found that the proposed policy provides often improved performance at low complexity over other state-of-the-art policies in the literature.

Download