No Arabic abstract
We study collaborative machine learning systems where a massive dataset is distributed across independent workers which compute their local gradient estimates based on their own datasets. Workers send their estimates through a multipath fading multiple access channel with orthogonal frequency division multiplexing to mitigate the frequency selectivity of the channel. We assume that there is no channel state information (CSI) at the workers, and the parameter server (PS) employs multiple antennas to align the received signals. To reduce the power consumption and the hardware costs, we employ complex-valued low-resolution digital-to-analog converters (DACs) and analog-to-digital converters (ADCs), at the transmitter and the receiver sides, respectively, and study the effects of practical low-cost DACs and ADCs on the learning performance. Our theoretical analysis shows that the impairments caused by low-resolution DACs and ADCs, including those of one-bit DACs and ADCs, do not prevent the convergence of the federated learning algorithm, and the multipath channel effects vanish when a sufficient number of antennas are used at the PS. We also validate our theoretical results via simulations, and demonstrate that using low-resolution, even one-bit, DACs and ADCs causes only a slight decrease in the learning accuracy.
We study wireless collaborative machine learning (ML), where mobile edge devices, each with its own dataset, carry out distributed stochastic gradient descent (DSGD) over-the-air with the help of a wireless access point acting as the parameter server (PS). At each iteration of the DSGD algorithm wireless devices compute gradient estimates with their local datasets, and send them to the PS over a wireless fading multiple access channel (MAC). Motivated by the additive nature of the wireless MAC, we propose an analog DSGD scheme, in which the devices transmit scal
We study federated edge learning (FEEL), where wireless edge devices, each with its own dataset, learn a global model collaboratively with the help of a wireless access point acting as the parameter server (PS). At each iteration, wireless devices perform local updates using their local data and the most recent global model received from the PS, and send their local updates to the PS over a wireless fading multiple access channel (MAC). The PS then updates the global model according to the signal received over the wireless MAC, and shares it with the devices. Motivated by the additive nature of the wireless MAC, we propose an analog `over-the-air aggregation scheme, in which the devices transmit their local updates in an uncoded fashion. Unlike recent literature on over-the-air edge learning, here we assume that the devices do not have channel state information (CSI), while the PS has imperfect CSI. Instead, the PS is equipped multiple antennas to alleviate the destructive effect of the channel, exacerbated due to the lack of perfect CSI. We design a receive beamforming scheme at the PS, and show that it can compensate for the lack of perfect CSI when the PS has a sufficient number of antennas. We also derive the convergence rate of the proposed algorithm highlighting the impact of the lack of perfect CSI, as well as the number of PS antennas. Both the experimental results and the convergence analysis illustrate the performance improvement of the proposed algorithm with the number of PS antennas, where the wireless fading MAC becomes deterministic despite the lack of perfect CSI when the PS has a sufficiently large number of antennas.
Over-the-air computation (OAC) is a promising technique to realize fast model aggregation in the uplink of federated edge learning. OAC, however, hinges on accurate channel-gain precoding and strict synchronization among the edge devices, which are challenging in practice. As such, how to design the maximum likelihood (ML) estimator in the presence of residual channel-gain mismatch and asynchronies is an open problem. To fill this gap, this paper formulates the problem of misaligned OAC for federated edge learning and puts forth a whitened matched filtering and sampling scheme to obtain oversampled, but independent, samples from the misaligned and overlapped signals. Given the whitened samples, a sum-product ML estimator and an aligned-sample estimator are devised to estimate the arithmetic sum of the transmitted symbols. In particular, the computational complexity of our sum-product ML estimator is linear in the packet length and hence is significantly lower than the conventional ML estimator. Extensive simulations on the test accuracy versus the average received energy per symbol to noise power spectral density ratio (EsN0) yield two main results: 1) In the low EsN0 regime, the aligned-sample estimator can achieve superior test accuracy provided that the phase misalignment is non-severe. In contrast, the ML estimator does not work well due to the error propagation and noise enhancement in the estimation process. 2) In the high EsN0 regime, the ML estimator attains the optimal learning performance regardless of the severity of phase misalignment. On the other hand, the aligned-sample estimator suffers from a test-accuracy loss caused by phase misalignment.
We consider a federated learning framework in which a parameter server (PS) trains a global model by using $n$ clients without actually storing the client data centrally at a cloud server. Focusing on a setting where the client datasets are fast changing and highly temporal in nature, we investigate the timeliness of model updates and propose a novel timely communication scheme. Under the proposed scheme, at each iteration, the PS waits for $m$ available clients and sends them the current model. Then, the PS uses the local updates of the earliest $k$ out of $m$ clients to update the global model at each iteration. We find the average age of information experienced by each client and numerically characterize the age-optimal $m$ and $k$ values for a given $n$. Our results indicate that, in addition to ensuring timeliness, the proposed communication scheme results in significantly smaller average iteration times compared to random client selection without hurting the convergence of the global learning task.
This paper considers the joint power control and resource allocation for a device-to-device (D2D) underlay cellular system with a multi-antenna BS employing ADCs with different resolutions. We propose a four-step algorithm that optimizes the ADC resolution profile at the base station (BS) to reduce the energy consumption and perform joint power control and resource allocation of D2D communication users (DUEs) and cellular users (CUEs) to improve the D2D reliability.