No Arabic abstract
Federated learning (FL) is a promising technique that enables many edge devices to train a machine learning model collaboratively in wireless networks. By exploiting the superposition nature of wireless waveforms, over-the-air computation (AirComp) can accelerate model aggregation and hence facilitate communication-efficient FL. Due to channel fading, power control is crucial in AirComp. Prior works assume that the signals to be aggregated from each device, i.e., local gradients have identical statistics. In FL, however, gradient statistics vary over both training iterations and feature dimensions, and are unknown in advance. This paper studies the power control problem for over-the-air FL by taking gradient statistics into account. The goal is to minimize the aggregation error by optimizing the transmit power at each device subject to peak power constraints. We obtain the optimal policy in closed form when gradient statistics are given. Notably, we show that the optimal transmit power is continuous and monotonically decreases with the squared multivariate coefficient of variation (SMCV) of gradient vectors. We then propose a method to estimate gradient statistics with negligible communication cost. Experimental results demonstrate that the proposed gradient-statistics-aware power control achieves higher test accuracy than the existing schemes for a wide range of scenarios.
Over-the-air federated edge learning (Air-FEEL) is a communication-efficient solution for privacy-preserving distributed learning over wireless networks. Air-FEEL allows one-shot over-the-air aggregation of gradient/model-updates by exploiting the waveform superposition property of wireless channels, and thus promises an extremely low aggregation latency that is independent of the network size. However, such communication efficiency may come at a cost of learning performance degradation due to the aggregation error caused by the non-uniform channel fading over devices and noise perturbation. Prior work adopted channel inversion power control (or its variants) to reduce the aggregation error by aligning the channel gains, which, however, could be highly suboptimal in deep fading scenarios due to the noise amplification. To overcome this issue, we investigate the power control optimization for enhancing the learning performance of Air-FEEL. Towards this end, we first analyze the convergence behavior of the Air-FEEL by deriving the optimality gap of the loss-function under any given power control policy. Then we optimize the power control to minimize the optimality gap for accelerating convergence, subject to a set of average and maximum power constraints at edge devices. The problem is generally non-convex and challenging to solve due to the coupling of power control variables over different devices and iterations. To tackle this challenge, we develop an efficient algorithm by jointly exploiting the successive convex approximation (SCA) and trust region methods. Numerical results show that the optimized power control policy achieves significantly faster convergence than the benchmark policies such as channel inversion and uniform power transmission.
This paper investigates the transmission power control in over-the-air federated edge learning (Air-FEEL) system. Different from conventional power control designs (e.g., to minimize the individual mean squared error (MSE) of the over-the-air aggregation at each round), we consider a new power control design aiming at directly maximizing the convergence speed. Towards this end, we first analyze the convergence behavior of Air-FEEL (in terms of the optimality gap) subject to aggregation errors at different communication rounds. It is revealed that if the aggregation estimates are unbiased, then the training algorithm would converge exactly to the optimal point with mild conditions; while if they are biased, then the algorithm would converge with an error floor determined by the accumulated estimate bias over communication rounds. Next, building upon the convergence results, we optimize the power control to directly minimize the derived optimality gaps under both biased and unbiased aggregations, subject to a set of average and maximum power constraints at individual edge devices. We transform both problems into convex forms, and obtain their structured optimal solutions, both appearing in a form of regularized channel inversion, by using the Lagrangian duality method. Finally, numerical results show that the proposed power control policies achieve significantly faster convergence for Air-FEEL, as compared with benchmark policies with fixed power transmission or conventional MSE minimization.
In the Internet of Things, learning is one of most prominent tasks. In this paper, we consider an Internet of Things scenario where federated learning is used with simultaneous transmission of model data and wireless power. We investigate the trade-off between the number of communication rounds and communication round time while harvesting energy to compensate the energy expenditure. We formulate and solve an optimization problem by considering the number of local iterations on devices, the time to transmit-receive the model updates, and to harvest sufficient energy. Numerical results indicate that maximum ratio transmission and zero-forcing beamforming for the optimization of the local iterations on devices substantially boost the test accuracy of the learning task. Moreover, maximum ratio transmission instead of zero-forcing provides the best test accuracy and communication round time trade-off for various energy harvesting percentages. Thus, it is possible to learn a model quickly with few communication rounds without depleting the battery.
Machine learning and wireless communication technologies are jointly facilitating an intelligent edge, where federated edge learning (FEEL) is a promising training framework. As wireless devices involved in FEEL are resource limited in terms of communication bandwidth, computing power and battery capacity, it is important to carefully schedule them to optimize the training performance. In this work, we consider an over-the-air FEEL system with analog gradient aggregation, and propose an energy-aware dynamic device scheduling algorithm to optimize the training performance under energy constraints of devices, where both communication energy for gradient aggregation and computation energy for local training are included. The consideration of computation energy makes dynamic scheduling challenging, as devices are scheduled before local training, but the communication energy for over-the-air aggregation depends on the l2-norm of local gradient, which is known after local training. We thus incorporate estimation methods into scheduling to predict the gradient norm. Taking the estimation error into account, we characterize the performance gap between the proposed algorithm and its offline counterpart. Experimental results show that, under a highly unbalanced local data distribution, the proposed algorithm can increase the accuracy by 4.9% on CIFAR-10 dataset compared with the myopic benchmark, while satisfying the energy constraints.
Analog over-the-air computation (OAC) is an efficient solution to a class of uplink data aggregation tasks over a multiple-access channel (MAC), wherein the receiver, dubbed the fusion center, aims to reconstruct a function of the data distributed at edge devices rather than the individual data themselves. Existing OAC relies exclusively on the maximum likelihood (ML) estimation at the fusion center to recover the arithmetic sum of the transmitted signals from different devices. ML estimation, however, is much susceptible to noise. In particular, in the misaligned OAC where there are channel misalignments among transmitted signals, ML estimation suffers from severe error propagation and noise enhancement. To address these challenges, this paper puts forth a Bayesian approach for OAC by letting each edge device transmit two pieces of prior information to the fusion center. Three OAC systems are studied: the aligned OAC with perfectly-aligned signals; the synchronous OAC with misaligned channel gains among the received signals; and the asynchronous OAC with both channel-gain and time misalignments. Using the prior information, we devise linear minimum mean squared error (LMMSE) estimators and a sum-product maximum a posteriori (SP-MAP) estimator for the three OAC systems. Numerical results verify that, 1) For the aligned and synchronous OAC, our LMMSE estimator significantly outperforms the ML estimator. In the low signal-to-noise ratio (SNR) regime, the LMMSE estimator reduces the mean squared error (MSE) by at least 6 dB; in the high SNR regime, the LMMSE estimator lowers the error floor on the MSE by 86.4%; 2) For the asynchronous OAC, our LMMSE and SP-MAP estimators are on an equal footing in terms of the MSE performance, and are significantly better than the ML estimator.