No Arabic abstract
By deploying machine-learning algorithms at the network edge, edge learning can leverage the enormous real-time data generated by billions of mobile devices to train AI models, which enable intelligent mobile applications. In this emerging research area, one key direction is to efficiently utilize radio resources for wireless data acquisition to minimize the latency of executing a learning task at an edge server. Along this direction, we consider the specific problem of retransmission decision in each communication round to ensure both reliability and quantity of those training data for accelerating model convergence. To solve the problem, a new retransmission protocol called data-importance aware automatic-repeat-request (importance ARQ) is proposed. Unlike the classic ARQ focusing merely on reliability, importance ARQ selectively retransmits a data sample based on its uncertainty which helps learning and can be measured using the model under training. Underpinning the proposed protocol is a derived elegant communication-learning relation between two corresponding metrics, i.e., signal-to-noise ratio (SNR) and data uncertainty. This relation facilitates the design of a simple threshold based policy for importance ARQ. The policy is first derived based on the classic classifier model of support vector machine (SVM), where the uncertainty of a data sample is measured by its distance to the decision boundary. The policy is then extended to the more complex model of convolutional neural networks (CNN) where data uncertainty is measured by entropy. Extensive experiments have been conducted for both the SVM and CNN using real datasets with balanced and imbalanced distributions. Experimental results demonstrate that importance ARQ effectively copes with channel fading and noise in wireless data acquisition to achieve faster model convergence than the conventional channel-aware ARQ.
By implementing machine learning at the network edge, edge learning trains models by leveraging rich data distributed at edge devices (e.g., smartphones and sensors) and in return endow on them capabilities of seeing, listening, and reasoning. In edge learning, the need of high-mobility wireless data acquisition arises in scenarios where edge devices (or even servers) are mounted on the ground or aerial vehicles. In this paper, we present a novel solution, called fast analog transmission (FAT), for high- mobility data acquisition in edge-learning systems, which has several key features. First, FAT incurs low-latency. Specifically, FAT requires no source-and-channel coding and no channel training via the proposed technique of Grassmann analog encoding (GAE) that encodes data samples into subspace matrices. Second, FAT supports spatial multiplexing by directly transmitting analog vector data over an antenna array. Third, FAT can be seamlessly integrated with edge learning (i.e., training of a classifier model in this work). In particular, by applying a Grassmannian-classification algorithm from computer vision, the received GAE encoded data can be directly applied to training the model without decoding and conversion. This design is found by simulation to outperform conventional schemes in learning accuracy due to its robustness against data distortion induced by fast fading.
Conventional frequentist learning, as assumed by existing federated learning protocols, is limited in its ability to quantify uncertainty, incorporate prior knowledge, guide active learning, and enable continual learning. Bayesian learning provides a principled approach to address all these limitations, at the cost of an increase in computational complexity. This paper studies distributed Bayesian learning in a wireless data center setting encompassing a central server and multiple distributed workers. Prior work on wireless distributed learning has focused exclusively on frequentist learning, and has introduced the idea of leveraging uncoded transmission to enable over-the-air computing. Unlike frequentist learning, Bayesian learning aims at evaluating approximations or samples from a global posterior distribution in the model parameter space. This work investigates for the first time the design of distributed one-shot, or embarrassingly parallel, Bayesian learning protocols in wireless data centers via consensus Monte Carlo (CMC). Uncoded transmission is introduced not only as a way to implement over-the-air computing, but also as a mechanism to deploy channel-driven MC sampling: Rather than treating channel noise as a nuisance to be mitigated, channel-driven sampling utilizes channel noise as an integral part of the MC sampling process. A simple wireless CMC scheme is first proposed that is asymptotically optimal under Gaussian local posteriors. Then, for arbitrary local posteriors, a variational optimization strategy is introduced. Simulation results demonstrate that, if properly accounted for, channel noise can indeed contribute to MC sampling and does not necessarily decrease the accuracy level.
While machine-type communication (MTC) devices generate massive data, they often cannot process this data due to limited energy and computation power. To this end, edge intelligence has been proposed, which collects distributed data and performs machine learning at the edge. However, this paradigm needs to maximize the learning performance instead of the communication throughput, for which the celebrated water-filling and max-min fairness algorithms become inefficient since they allocate resources merely according to the quality of wireless channels. This paper proposes a learning centric power allocation (LCPA) method, which allocates radio resources based on an empirical classification error model. To get insights into LCPA, an asymptotic optimal solution is derived. The solution shows that the transmit powers are inversely proportional to the channel gain, and scale exponentially with the learning parameters. Experimental results show that the proposed LCPA algorithm significantly outperforms other power allocation algorithms.
Mobile edge learning is an emerging technique that enables distributed edge devices to collaborate in training shared machine learning models by exploiting their local data samples and communication and computation resources. To deal with the straggler dilemma issue faced in this technique, this paper proposes a new device to device enabled data sharing approach, in which different edge devices share their data samples among each other over communication links, in order to properly adjust their computation loads for increasing the training speed. Under this setup, we optimize the radio resource allocation for both data sharing and distributed training, with the objective of minimizing the total training delay under fixed numbers of local and global iterations. Numerical results show that the proposed data sharing design significantly reduces the training delay, and also enhances the training accuracy when the data samples are non independent and identically distributed among edge devices.
We study federated edge learning (FEEL), where wireless edge devices, each with its own dataset, learn a global model collaboratively with the help of a wireless access point acting as the parameter server (PS). At each iteration, wireless devices perform local updates using their local data and the most recent global model received from the PS, and send their local updates to the PS over a wireless fading multiple access channel (MAC). The PS then updates the global model according to the signal received over the wireless MAC, and shares it with the devices. Motivated by the additive nature of the wireless MAC, we propose an analog `over-the-air aggregation scheme, in which the devices transmit their local updates in an uncoded fashion. Unlike recent literature on over-the-air edge learning, here we assume that the devices do not have channel state information (CSI), while the PS has imperfect CSI. Instead, the PS is equipped multiple antennas to alleviate the destructive effect of the channel, exacerbated due to the lack of perfect CSI. We design a receive beamforming scheme at the PS, and show that it can compensate for the lack of perfect CSI when the PS has a sufficient number of antennas. We also derive the convergence rate of the proposed algorithm highlighting the impact of the lack of perfect CSI, as well as the number of PS antennas. Both the experimental results and the convergence analysis illustrate the performance improvement of the proposed algorithm with the number of PS antennas, where the wireless fading MAC becomes deterministic despite the lack of perfect CSI when the PS has a sufficiently large number of antennas.