No Arabic abstract
This study develops a federated learning (FL) framework overcoming largely incremental communication costs due to model sizes in typical frameworks without compromising model performance. To this end, based on the idea of leveraging an unlabeled open dataset, we propose a distillation-based semi-supervised FL (DS-FL) algorithm that exchanges the outputs of local models among mobile devices, instead of model parameter exchange employed by the typical frameworks. In DS-FL, the communication cost depends only on the output dimensions of the models and does not scale up according to the model size. The exchanged model outputs are used to label each sample of the open dataset, which creates an additionally labeled dataset. Based on the new dataset, local models are further trained, and model performance is enhanced owing to the data augmentation effect. We further highlight that in DS-FL, the heterogeneity of the devices dataset leads to ambiguous of each data sample and lowing of the training convergence. To prevent this, we propose entropy reduction averaging, where the aggregated model outputs are intentionally sharpened. Moreover, extensive experiments show that DS-FL reduces communication costs up to 99% relative to those of the FL benchmark while achieving similar or higher classification accuracy.
Federated edge learning (FEEL) has emerged as an effective alternative to reduce the large communication latency in Cloud-based machine learning solutions, while preserving data privacy. Unfortunately, the learning performance of FEEL may be compromised due to limited training data in a single edge cluster. In this paper, we investigate a novel framework of FEEL, namely semi-decentralized federated edge learning (SD-FEEL). By allowing model aggregation between different edge clusters, SD-FEEL enjoys the benefit of FEEL in reducing training latency and improves the learning performance by accessing richer training data from multiple edge clusters. A training algorithm for SD-FEEL with three main procedures in each round is presented, including local model updates, intra-cluster and inter-cluster model aggregations, and it is proved to converge on non-independent and identically distributed (non-IID) data. We also characterize the interplay between the network topology of the edge servers and the communication overhead of inter-cluster model aggregation on training performance. Experiment results corroborate our analysis and demonstrate the effectiveness of SD-FFEL in achieving fast convergence. Besides, guidelines on choosing critical hyper-parameters of the training algorithm are also provided.
Federated Learning allows training machine learning models by using the computation and private data resources of a large number of distributed clients such as smartphones and IoT devices. Most existing works on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data, e.g., due to lack of expertise. In this work, we consider a server that hosts a labeled dataset, and wishes to leverage clients with unlabeled data for supervised learning. We propose a new Federated Learning framework referred to as SemiFL in order to address the problem of Semi-Supervised Federated Learning (SSFL). In SemiFL, clients have completely unlabeled data, while the server has a small amount of labeled data. SemiFL is communication efficient since it separates the training of server-side supervised data and client-side unsupervised data. We demonstrate various efficient strategies of SemiFL that enhance learning performance. Extensive empirical evaluations demonstrate that our communication efficient method can significantly improve the performance of a labeled server with unlabeled clients. Moreover, we demonstrate that SemiFL can outperform many existing FL results trained with fully supervised data, and perform competitively with the state-of-the-art centralized Semi-Supervised Learning (SSL) methods. For instance, in standard communication efficient scenarios, our method can perform 93% accuracy on the CIFAR10 dataset with only 4000 labeled samples at the server. Such accuracy is only 2% away from the result trained from 50000 fully labeled data, and it improves about 30% upon existing SSFL methods in the communication efficient setting.
We investigate the generalization of semi-supervised learning (SSL) to diverse pixel-wise tasks. Although SSL methods have achieved impressive results in image classification, the performances of applying them to pixel-wise tasks are unsatisfactory due to their need for dense outputs. In addition, existing pixel-wise SSL approaches are only suitable for certain tasks as they usually require to use task-specific properties. In this paper, we present a new SSL framework, named Guided Collaborative Training (GCT), for pixel-wise tasks, with two main technical contributions. First, GCT addresses the issues caused by the dense outputs through a novel flaw detector. Second, the modules in GCT learn from unlabeled data collaboratively through two newly proposed constraints that are independent of task-specific properties. As a result, GCT can be applied to a wide range of pixel-wise tasks without structural adaptation. Our extensive experiments on four challenging vision tasks, including semantic segmentation, real image denoising, portrait image matting, and night image enhancement, show that GCT outperforms state-of-the-art SSL methods by a large margin. Our code available at: https://github.com/ZHKKKe/PixelSSL.
Federated learning enables multiple clients to collaboratively learn a global model by periodically aggregating the clients models without transferring the local data. However, due to the heterogeneity of the system and data, many approaches suffer from the client-drift issue that could significantly slow down the convergence of the global model training. As clients perform local updates on heterogeneous data through heterogeneous systems, their local models drift apart. To tackle this issue, one intuitive idea is to guide the local model training by the global teachers, i.e., past global models, where each client learns the global knowledge from past global models via adaptive knowledge distillation techniques. Coming from these insights, we propose a novel approach for heterogeneous federated learning, namely FedGKD, which fuses the knowledge from historical global models for local training to alleviate the client-drift issue. In this paper, we evaluate FedGKD with extensive experiments on various CV/NLP datasets (i.e., CIFAR-10/100, Tiny-ImageNet, AG News, SST5) and different heterogeneous settings. The proposed method is guaranteed to converge under common assumptions, and achieves superior empirical accuracy in fewer communication runs than five state-of-the-art methods.
Federated learning (FL) has attracted tremendous attentions in recent years due to its privacy preserving measures and great potentials in some distributed but privacy-sensitive applications like finance and health. However, high communication overloads for transmitting high-dimensional networks and extra security masks remains a bottleneck of FL. This paper proposes a communication-efficient FL framework with Adaptive Quantized Gradient (AQG) which adaptively adjusts the quantization level based on local gradients update to fully utilize the heterogeneousness of local data distribution for reducing unnecessary transmissions. Besides, the client dropout issues are taken into account and the Augmented AQG is developed, which could limit the dropout noise with an appropriate amplification mechanism for transmitted gradients. Theoretical analysis and experiment results show that the proposed AQG leads to 25%-50% of additional transmission reduction as compared to existing popular methods including Quantized Gradient Descent (QGD) and Lazily Aggregated Quantized (LAQ) gradient-based method without deteriorating convergence properties. Particularly, experiments with heterogenous data distributions corroborate a more significant transmission reduction compared with independent identical data distributions. Meanwhile, the proposed AQG is robust to a client dropping rate up to 90% empirically, and the Augmented AQG manages to further improve the FL systems communication efficiency with the presence of moderate-scale client dropouts commonly seen in practical FL scenarios.