No Arabic abstract
Federated learning (FL) is an emerging paradigm for machine learning, in which data owners can collaboratively train a model by sharing gradients instead of their raw data. Two fundamental research problems in FL are incentive mechanism and privacy protection. The former focuses on how to incentivize data owners to participate in FL. The latter studies how to protect data owners privacy while maintaining high utility of trained models. However, incentive mechanism and privacy protection in FL have been studied separately and no work solves both problems at the same time. In this work, we address the two problems simultaneously by an FL-Market that incentivizes data owners participation by providing appropriate payments and privacy protection. FL-Market enables data owners to obtain compensation according to their privacy loss quantified by local differential privacy (LDP). Our insight is that, by meeting data owners personalized privacy preferences and providing appropriate payments, we can (1) incentivize privacy risk-tolerant data owners to set larger privacy parameters (i.e., gradients with less noise) and (2) provide preferred privacy protection for privacy risk-averse data owners. To achieve this, we design a personalized LDP-based FL framework with a deep learning-empowered auction mechanism for incentivizing trading gradients with less noise and optimal aggregation mechanisms for model updates. Our experiments verify the effectiveness of the proposed framework and mechanisms.
Federated learning is the distributed machine learning framework that enables collaborative training across multiple parties while ensuring data privacy. Practical adaptation of XGBoost, the state-of-the-art tree boosting framework, to federated learning remains limited due to high cost incurred by conventional privacy-preserving methods. To address the problem, we propose two variants of federated XGBoost with privacy guarantee: FedXGBoost-SMM and FedXGBoost-LDP. Our first protocol FedXGBoost-SMM deploys enhanced secure matrix multiplication method to preserve privacy with lossless accuracy and lower overhead than encryption-based techniques. Developed independently, the second protocol FedXGBoost-LDP is heuristically designed with noise perturbation for local differential privacy, and empirically evaluated on real-world and synthetic datasets.
In crowdsourcing markets, there are two different type jobs, i.e. homogeneous jobs and heterogeneous jobs, which need to be allocated to workers. Incentive mechanisms are essential to attract extensive user participating for achieving good service quality, especially under a given budget constraint condition. To this end, recently, Singer et al. propose a novel class of auction mechanisms for determining near-optimal prices of tasks for crowdsourcing markets constrained by the given budget. Their mechanisms are very useful to motivate extensive user to truthfully participate in crowdsourcing markets. Although they are so important, there still exist many security and privacy challenges in real-life environments. In this paper, we present a general privacy-preserving verifiable incentive mechanism for crowdsourcing markets with the budget constraint, not only to exploit how to protect the bids and assignments privacy, and the chosen winners privacy in crowdsourcing markets with homogeneous jobs and heterogeneous jobs and identity privacy from users, but also to make the verifiable payment between the platform and users for crowdsourcing applications. Results show that our general privacy-preserving verifiable incentive mechanisms achieve the same results as the generic one without privacy preservation.
Recently, a novel class of incentive mechanisms is proposed to attract extensive users to truthfully participate in crowd sensing applications with a given budget constraint. The class mechanisms also bring good service quality for the requesters in crowd sensing applications. Although it is so important, there still exists many verification and privacy challenges, including users bids and subtask information privacy and identification privacy, winners set privacy of the platform, and the security of the payment outcomes. In this paper, we present a privacy-preserving verifiable incentive mechanism for crowd sensing applications with the budget constraint, not only to explore how to protect the privacies of users and the platform, but also to make the verifiable payment correct between the platform and users for crowd sensing applications. Results indicate that our privacy-preserving verifiable incentive mechanism achieves the same results as the generic one without privacy preservation.
Many application scenarios call for training a machine learning model among multiple participants. Federated learning (FL) was proposed to enable joint training of a deep learning model using the local data in each party without revealing the data to others. Among various types of FL methods, vertical FL is a category to handle data sources with the same ID space and different feature spaces. However, existing vertical FL methods suffer from limitations such as restrictive neural network structure, slow training speed, and often lack the ability to take advantage of data with unmatched IDs. In this work, we propose an FL method called self-taught federated learning to address the aforementioned issues, which uses unsupervised feature extraction techniques for distributed supervised deep learning tasks. In this method, only latent variables are transmitted to other parties for model training, while privacy is preserved by storing the data and parameters of activations, weights, and biases locally. Extensive experiments are performed to evaluate and demonstrate the validity and efficiency of the proposed method.
Artificial neural network has achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns and people want to take control over their sensitive information during both training and using processes. To address this problem, we propose a privacy-preserving method for the distributed system, Stochastic Channel-Based Federated Learning (SCBF), which enables the participants to train a high-performance model cooperatively without sharing their inputs. Specifically, we design, implement and evaluate a channel-based update algorithm for the central server in a distributed system, which selects the channels with regard to the most active features in a training loop and uploads them as learned information from local datasets. A pruning process is applied to the algorithm based on the validation set, which serves as a model accelerator. In the experiment, our model presents better performances and higher saturating speed than the Federated Averaging method which reveals all the parameters of local models to the server when updating. We also demonstrate that the saturating rate of performance could be promoted by introducing a pruning process. And further improvement could be achieved by tuning the pruning rate. Our experiment shows that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUCROC performance and a reduction of 0.0068 in AUCPR.