No Arabic abstract
Federated learning is the distributed machine learning framework that enables collaborative training across multiple parties while ensuring data privacy. Practical adaptation of XGBoost, the state-of-the-art tree boosting framework, to federated learning remains limited due to high cost incurred by conventional privacy-preserving methods. To address the problem, we propose two variants of federated XGBoost with privacy guarantee: FedXGBoost-SMM and FedXGBoost-LDP. Our first protocol FedXGBoost-SMM deploys enhanced secure matrix multiplication method to preserve privacy with lossless accuracy and lower overhead than encryption-based techniques. Developed independently, the second protocol FedXGBoost-LDP is heuristically designed with noise perturbation for local differential privacy, and empirically evaluated on real-world and synthetic datasets.
Many application scenarios call for training a machine learning model among multiple participants. Federated learning (FL) was proposed to enable joint training of a deep learning model using the local data in each party without revealing the data to others. Among various types of FL methods, vertical FL is a category to handle data sources with the same ID space and different feature spaces. However, existing vertical FL methods suffer from limitations such as restrictive neural network structure, slow training speed, and often lack the ability to take advantage of data with unmatched IDs. In this work, we propose an FL method called self-taught federated learning to address the aforementioned issues, which uses unsupervised feature extraction techniques for distributed supervised deep learning tasks. In this method, only latent variables are transmitted to other parties for model training, while privacy is preserved by storing the data and parameters of activations, weights, and biases locally. Extensive experiments are performed to evaluate and demonstrate the validity and efficiency of the proposed method.
Artificial neural network has achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns and people want to take control over their sensitive information during both training and using processes. To address this problem, we propose a privacy-preserving method for the distributed system, Stochastic Channel-Based Federated Learning (SCBF), which enables the participants to train a high-performance model cooperatively without sharing their inputs. Specifically, we design, implement and evaluate a channel-based update algorithm for the central server in a distributed system, which selects the channels with regard to the most active features in a training loop and uploads them as learned information from local datasets. A pruning process is applied to the algorithm based on the validation set, which serves as a model accelerator. In the experiment, our model presents better performances and higher saturating speed than the Federated Averaging method which reveals all the parameters of local models to the server when updating. We also demonstrate that the saturating rate of performance could be promoted by introducing a pruning process. And further improvement could be achieved by tuning the pruning rate. Our experiment shows that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUCROC performance and a reduction of 0.0068 in AUCPR.
Federated learning (FL) is an emerging paradigm for machine learning, in which data owners can collaboratively train a model by sharing gradients instead of their raw data. Two fundamental research problems in FL are incentive mechanism and privacy protection. The former focuses on how to incentivize data owners to participate in FL. The latter studies how to protect data owners privacy while maintaining high utility of trained models. However, incentive mechanism and privacy protection in FL have been studied separately and no work solves both problems at the same time. In this work, we address the two problems simultaneously by an FL-Market that incentivizes data owners participation by providing appropriate payments and privacy protection. FL-Market enables data owners to obtain compensation according to their privacy loss quantified by local differential privacy (LDP). Our insight is that, by meeting data owners personalized privacy preferences and providing appropriate payments, we can (1) incentivize privacy risk-tolerant data owners to set larger privacy parameters (i.e., gradients with less noise) and (2) provide preferred privacy protection for privacy risk-averse data owners. To achieve this, we design a personalized LDP-based FL framework with a deep learning-empowered auction mechanism for incentivizing trading gradients with less noise and optimal aggregation mechanisms for model updates. Our experiments verify the effectiveness of the proposed framework and mechanisms.
Recently, Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data, consisting of node features and the adjacent information between different nodes. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different data holders in practice, which is the so-called data isolation problem. To solve this problem, in this paper, we propose VFGNN, a federated GNN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GNN models. Specifically, we split the computation graph into two parts. We leave the private data (i.e., features, edges, and labels) related computations on data holders, and delegate the rest of computations to a semi-honest server. We also propose to apply differential privacy to prevent potential information leakage from the server. We conduct experiments on three benchmarks and the results demonstrate the effectiveness of VFGNN.
With the rising use of Machine Learning (ML) and Deep Learning (DL) in various industries, the medical industry is also not far behind. A very simple yet extremely important use case of ML in this industry is for image classification. This is important for doctors to help them detect certain diseases timely, thereby acting as an aid to reduce chances of human judgement error. However, when using automated systems like these, there is a privacy concern as well. Attackers should not be able to get access to the medical records and images of the patients. It is also required that the model be secure, and that the data that is sent to the model and the predictions that are received both should not be revealed to the model in clear text. In this study, we aim to solve these problems in the context of a medical image classification problem of detection of pneumonia by examining chest x-ray images.