ترغب بنشر مسار تعليمي؟ اضغط هنا

Multi-Armed Bandit Based Client Scheduling for Federated Learning

330   0   0.0 ( 0 )
 نشر من قبل Wenchao Xia
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

By exploiting the computing power and local data of distributed clients, federated learning (FL) features ubiquitous properties such as reduction of communication overhead and preserving data privacy. In each communication round of FL, the clients update local models based on their own data and upload their local updates via wireless channels. However, latency caused by hundreds to thousands of communication rounds remains a bottleneck in FL. To minimize the training latency, this work provides a multi-armed bandit-based framework for online client scheduling (CS) in FL without knowing wireless channel state information and statistical characteristics of clients. Firstly, we propose a CS algorithm based on the upper confidence bound policy (CS-UCB) for ideal scenarios where local datasets of clients are independent and identically distributed (i.i.d.) and balanced. An upper bound of the expected performance regret of the proposed CS-UCB algorithm is provided, which indicates that the regret grows logarithmically over communication rounds. Then, to address non-ideal scenarios with non-i.i.d. and unbalanced properties of local datasets and varying availability of clients, we further propose a CS algorithm based on the UCB policy and virtual queue technique (CS-UCB-Q). An upper bound is also derived, which shows that the expected performance regret of the proposed CS-UCB-Q algorithm can have a sub-linear growth over communication rounds under certain conditions. Besides, the convergence performance of FL training is also analyzed. Finally, simulation results validate the efficiency of the proposed algorithms.



قيم البحث

اقرأ أيضاً

We consider federated edge learning (FEEL) over wireless fading channels taking into account the downlink and uplink channel latencies, and the random computation delays at the clients. We speed up the training process by overlapping the communicatio n with computation. With fountain coded transmission of the global model update, clients receive the global model asynchronously, and start performing local computations right away. Then, we propose a dynamic client scheduling policy, called MRTP, for uploading local model updates to the parameter server (PS), which, at any time, schedules the client with the minimum remaining upload time. However, MRTP can lead to biased participation of clients in the update process, resulting in performance degradation in non-iid data scenarios. To overcome this, we propose two alternative schemes with fairness considerations, termed as age-aware MRTP (A-MRTP), and opportunistically fair MRTP (OF-MRTP). In A-MRTP, the remaining clients are scheduled according to the ratio between their remaining transmission time and the update age, while in OF-MRTP, the selection mechanism utilizes the long term average channel rate of the clients to further reduce the latency while ensuring fair participation of the clients. It is shown through numerical simulations that OF-MRTP provides significant reduction in latency without sacrificing test accuracy.
A sensing policy for the restless multi-armed bandit problem with stationary but unknown reward distributions is proposed. The work is presented in the context of cognitive radios in which the bandit problem arises when deciding which parts of the sp ectrum to sense and exploit. It is shown that the proposed policy attains asymptotically logarithmic weak regret rate when the rewards are bounded independent and identically distributed or finite state Markovian. Simulation results verifying uniformly logarithmic weak regret are also presented. The proposed policy is a centrally coordinated index policy, in which the index of a frequency band is comprised of a sample mean term and a confidence term. The sample mean term promotes spectrum exploitation whereas the confidence term encourages exploration. The confidence term is designed such that the time interval between consecutive sensing instances of any suboptimal band grows exponentially. This exponential growth between suboptimal sensing time instances leads to logarithmically growing weak regret. Simulation results demonstrate that the proposed policy performs better than other similar methods in the literature.
This paper proposes using the uncertainty of information (UoI), measured by Shannons entropy, as a metric for information freshness. We consider a system in which a central monitor observes multiple binary Markov processes through a communication cha nnel. The UoI of a Markov process corresponds to the monitors uncertainty about its state. At each time step, only one Markov process can be selected to update its state to the monitor; hence there is a tradeoff among the UoIs of the processes that depend on the scheduling policy used to select the process to be updated. The age of information (AoI) of a process corresponds to the time since its last update. In general, the associated UoI can be a non-increasing function, or even an oscillating function, of its AoI, making the scheduling problem particularly challenging. This paper investigates scheduling policies that aim to minimize the average sum-UoI of the processes over the infinite time horizon. We formulate the problem as a restless multi-armed bandit (RMAB) problem, and develop a Whittle index policy that is near-optimal for the RMAB after proving its indexability. We further provide an iterative algorithm to compute the Whittle index for the practical deployment of the policy. Although this paper focuses on UoI scheduling, our results apply to a general class of RMABs for which the UoI scheduling problem is a special case. Specifically, this papers Whittle index policy is valid for any RMAB in which the bandits are binary Markov processes and the penalty is a concave function of the belief state of the Markov process. Numerical results demonstrate the excellent performance of the Whittle index policy for this class of RMABs.
This paper investigates learning-based caching in small-cell networks (SCNs) when user preference is unknown. The goal is to optimize the cache placement in each small base station (SBS) for minimizing the system long-term transmission delay. We mode l this sequential multi-agent decision making problem in a multi-agent multi-armed bandit (MAMAB) perspective. Rather than estimating user preference first and then optimizing the cache strategy, we propose several MAMAB-based algorithms to directly learn the cache strategy online in both stationary and non-stationary environment. In the stationary environment, we first propose two high-complexity agent-based collaborative MAMAB algorithms with performance guarantee. Then we propose a low-complexity distributed MAMAB which ignores the SBS coordination. To achieve a better balance between SBS coordination gain and computational complexity, we develop an edge-based collaborative MAMAB with the coordination graph edge-based reward assignment method. In the non-stationary environment, we modify the MAMAB-based algorithms proposed in the stationary environment by proposing a practical initialization method and designing new perturbed terms to adapt to the dynamic environment. Simulation results are provided to validate the effectiveness of our proposed algorithms. The effects of different parameters on caching performance are also discussed.
Motivated by the increasing computational capacity of wireless user equipments (UEs), e.g., smart phones, tablets, or vehicles, as well as the increasing concerns about sharing private data, a new machine learning model has emerged, namely federated learning (FL), that allows a decoupling of data acquisition and computation at the central unit. Unlike centralized learning taking place in a data center, FL usually operates in a wireless edge network where the communication medium is resource-constrained and unreliable. Due to limited bandwidth, only a portion of UEs can be scheduled for updates at each iteration. Due to the shared nature of the wireless medium, transmissions are subjected to interference and are not guaranteed. The performance of FL system in such a setting is not well understood. In this paper, an analytical model is developed to characterize the performance of FL in wireless networks. Particularly, tractable expressions are derived for the convergence rate of FL in a wireless setting, accounting for effects from both scheduling schemes and inter-cell interference. Using the developed analysis, the effectiveness of three different scheduling policies, i.e., random scheduling (RS), round robin (RR), and proportional fair (PF), are compared in terms of FL convergence rate. It is shown that running FL with PF outperforms RS and RR if the network is operating under a high signal-to-interference-plus-noise ratio (SINR) threshold, while RR is more preferable when the SINR threshold is low. Moreover, the FL convergence rate decreases rapidly as the SINR threshold increases, thus confirming the importance of compression and quantization of the update parameters. The analysis also reveals a trade-off between the number of scheduled UEs and subchannel bandwidth under a fixed amount of available spectrum.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا