ﻻ يوجد ملخص باللغة العربية
A sensing policy for the restless multi-armed bandit problem with stationary but unknown reward distributions is proposed. The work is presented in the context of cognitive radios in which the bandit problem arises when deciding which parts of the spectrum to sense and exploit. It is shown that the proposed policy attains asymptotically logarithmic weak regret rate when the rewards are bounded independent and identically distributed or finite state Markovian. Simulation results verifying uniformly logarithmic weak regret are also presented. The proposed policy is a centrally coordinated index policy, in which the index of a frequency band is comprised of a sample mean term and a confidence term. The sample mean term promotes spectrum exploitation whereas the confidence term encourages exploration. The confidence term is designed such that the time interval between consecutive sensing instances of any suboptimal band grows exponentially. This exponential growth between suboptimal sensing time instances leads to logarithmically growing weak regret. Simulation results demonstrate that the proposed policy performs better than other similar methods in the literature.
This paper proposes using the uncertainty of information (UoI), measured by Shannons entropy, as a metric for information freshness. We consider a system in which a central monitor observes multiple binary Markov processes through a communication cha
By exploiting the computing power and local data of distributed clients, federated learning (FL) features ubiquitous properties such as reduction of communication overhead and preserving data privacy. In each communication round of FL, the clients up
The multi-armed bandit (MAB) is a classical online optimization model for the trade-off between exploration and exploitation. The traditional MAB is concerned with finding the arm that minimizes the mean cost. However, minimizing the mean does not ta
Restless Multi-Armed Bandits (RMABs) have been popularly used to model limited resource allocation problems. Recently, these have been employed for health monitoring and intervention planning problems. However, the existing approaches fail to account
We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive