Federated Learning with Bayesian Differential Privacy

353 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Aleksei Triastcyn

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Aleksei Triastcyn - Boi Faltings

التعلم الآلي التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider the problem of reinforcing federated learning with formal privacy guarantees. We propose to employ Bayesian differential privacy, a relaxation of differential privacy for similarly distributed data, to provide sharper privacy loss bounds. We adapt the Bayesian privacy accounting method to the federated setting and suggest multiple improvements for more efficient privacy budgeting at different levels. Our experiments show significant advantage over the state-of-the-art differential privacy bounds for federated learning on image classification tasks, including a medical application, bringing the privacy budget below 1 at the client level, and below 0.1 at the instance level. Lower amounts of noise also benefit the model accuracy and reduce the number of communication rounds.

قيم البحث

131 - Yipeng Zhou , Xuezheng Liu , Yao Fu 2021

Federated learning (FL) empowers distributed clients to collaboratively train a shared machine learning model through exchanging parameter information. Despite the fact that FL can protect clients raw data, malicious users can still crack original da ta with disclosed parameters. To amend this flaw, differential privacy (DP) is incorporated into FL clients to disturb original parameters, which however can significantly impair the accuracy of the trained model. In this work, we study a crucial question which has been vastly overlooked by existing works: what are the optimal numbers of queries and replies in FL with DP so that the final model accuracy is maximized. In FL, the parameter server (PS) needs to query participating clients for multiple global iterations to complete training. Each client responds a query from the PS by conducting a local iteration. Our work investigates how many times the PS should query clients and how many times each client should reply the PS. We investigate two most extensively used DP mechanisms (i.e., the Laplace mechanism and Gaussian mechanisms). Through conducting convergence rate analysis, we can determine the optimal numbers of queries and replies in FL with DP so that the final model accuracy can be maximized. Finally, extensive experiments are conducted with publicly available datasets: MNIST and FEMNIST, to verify our analysis and the results demonstrate that properly setting the numbers of queries and replies can significantly improve the final model accuracy in FL with DP.

التعلم الآلي التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية

Bayesian Differential Privacy for Machine Learning

138 - Aleksei Triastcyn , Boi Faltings 2019

Traditional differential privacy is independent of the data distribution. However, this is not well-matched with the modern machine learning context, where models are trained on specific data. As a result, achieving meaningful privacy guarantees in M L often excessively reduces accuracy. We propose Bayesian differential privacy (BDP), which takes into account the data distribution to provide more practical privacy guarantees. We also derive a general privacy accounting method under BDP, building upon the well-known moments accountant. Our experiments demonstrate that in-distribution samples in classic machine learning datasets, such as MNIST and CIFAR-10, enjoy significantly stronger privacy guarantees than postulated by DP, while models maintain high classification accuracy.

التعلم الآلي التشفير والأمن التعلم الالي

Stochastic Channel-Based Federated Learning for Medical Data Privacy Preserving

185 - Rulin Shao , Hongyu He , Hui Liu 2019

Artificial neural network has achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns and people want to ta ke control over their sensitive information during both training and using processes. To address this problem, we propose a privacy-preserving method for the distributed system, Stochastic Channel-Based Federated Learning (SCBF), which enables the participants to train a high-performance model cooperatively without sharing their inputs. Specifically, we design, implement and evaluate a channel-based update algorithm for the central server in a distributed system, which selects the channels with regard to the most active features in a training loop and uploads them as learned information from local datasets. A pruning process is applied to the algorithm based on the validation set, which serves as a model accelerator. In the experiment, our model presents better performances and higher saturating speed than the Federated Averaging method which reveals all the parameters of local models to the server when updating. We also demonstrate that the saturating rate of performance could be promoted by introducing a pruning process. And further improvement could be achieved by tuning the pruning rate. Our experiment shows that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUCROC performance and a reduction of 0.0068 in AUCPR.

التعلم الآلي التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية

Securing Secure Aggregation: Mitigating Multi-Round Privacy Leakage in Federated Learning

101 - Jinhyun So , Ramy E. Ali , Basak Guler 2021

Secure aggregation is a critical component in federated learning, which enables the server to learn the aggregate model of the users without observing their local models. Conventionally, secure aggregation algorithms focus only on ensuring the privac y of individual users in a single training round. We contend that such designs can lead to significant privacy leakages over multiple training rounds, due to partial user selection/participation at each round of federated learning. In fact, we empirically show that the conventional random user selection strategies for federated learning lead to leaking users individual models within number of rounds linear in the number of users. To address this challenge, we introduce a secure aggregation framework with multi-round privacy guarantees. In particular, we introduce a new metric to quantify the privacy guarantees of federated learning over multiple training rounds, and develop a structured user selection strategy that guarantees the long-term privacy of each user (over any number of training rounds). Our framework also carefully accounts for the fairness and the average number of participating users at each round. We perform several experiments on MNIST and CIFAR-10 datasets in the IID and the non-IID settings to demonstrate the performance improvement over the baseline algorithms, both in terms of privacy protection and test accuracy.

التعلم الآلي التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية

Deep Learning with Gaussian Differential Privacy

110 - Zhiqi Bu , Jinshuo Dong , Qi Long 2019

Deep learning models are often trained on datasets that contain sensitive information such as individuals shopping transactions, personal contacts, and medical records. An increasingly important line of work therefore has sought to train neural netwo rks subject to privacy constraints that are specified by differential privacy or its divergence-based relaxations. These privacy definitions, however, have weaknesses in handling certain important primitives (composition and subsampling), thereby giving loose or complicated privacy analyses of training neural networks. In this paper, we consider a recently proposed privacy definition termed textit{$f$-differential privacy} [18] for a refined privacy analysis of training neural networks. Leveraging the appealing properties of $f$-differential privacy in handling composition and subsampling, this paper derives analytically tractable expressions for the privacy guarantees of both stochastic gradient descent and Adam used in training deep neural networks, without the need of developing sophisticated techniques as [3] did. Our results demonstrate that the $f$-differential privacy framework allows for a new privacy analysis that improves on the prior analysis~[3], which in turn suggests tuning certain parameters of neural networks for a better prediction accuracy without violating the privacy budget. These theoretically derived improvements are confirmed by our experiments in a range of tasks in image classification, text classification, and recommender systems. Python code to calculate the privacy cost for these experiments is publicly available in the texttt{TensorFlow Privacy} library.

التعلم الآلي التشفير والأمن التعلم الالي