Dopamine: Differentially Private Federated Learning on Medical Data

319 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mohammad Malekzadeh

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Mohammad Malekzadeh - Burak Hasircioglu - Nitish Mital

التعلم الآلي التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

While rich medical datasets are hosted in hospitals distributed across the world, concerns on patients privacy is a barrier against using such data to train deep neural networks (DNNs) for medical diagnostics. We propose Dopamine, a system to train DNNs on distributed datasets, which employs federated learning (FL) with differentially-private stochastic gradient descent (DPSGD), and, in combination with secure aggregation, can establish a better trade-off between differential privacy (DP) guarantee and DNNs accuracy than other approaches. Results on a diabetic retinopathy~(DR) task show that Dopamine provides a DP guarantee close to the centralized training counterpart, while achieving a better classification accuracy than FL with parallel DP where DPSGD is applied without coordination. Code is available at https://github.com/ipc-lab/private-ml-for-health.

قيم البحث

185 - Rulin Shao , Hongyu He , Hui Liu 2019

Artificial neural network has achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns and people want to ta ke control over their sensitive information during both training and using processes. To address this problem, we propose a privacy-preserving method for the distributed system, Stochastic Channel-Based Federated Learning (SCBF), which enables the participants to train a high-performance model cooperatively without sharing their inputs. Specifically, we design, implement and evaluate a channel-based update algorithm for the central server in a distributed system, which selects the channels with regard to the most active features in a training loop and uploads them as learned information from local datasets. A pruning process is applied to the algorithm based on the validation set, which serves as a model accelerator. In the experiment, our model presents better performances and higher saturating speed than the Federated Averaging method which reveals all the parameters of local models to the server when updating. We also demonstrate that the saturating rate of performance could be promoted by introducing a pruning process. And further improvement could be achieved by tuning the pruning rate. Our experiment shows that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUCROC performance and a reduction of 0.0068 in AUCPR.

التعلم الآلي التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية

Hybrid Differentially Private Federated Learning on Vertically Partitioned Data

94 - Chang Wang , Jian Liang , Mingkai Huang 2020

We present HDP-VFL, the first hybrid differentially private (DP) framework for vertical federated learning (VFL) to demonstrate that it is possible to jointly learn a generalized linear model (GLM) from vertically partitioned data with only a negligi ble cost, w.r.t. training time, accuracy, etc., comparing to idealized non-private VFL. Our work builds on the recent advances in VFL-based collaborative training among different organizations which rely on protocols like Homomorphic Encryption (HE) and Secure Multi-Party Computation (MPC) to secure computation and training. In particular, we analyze how VFLs intermediate result (IR) can leak private information of the training data during communication and design a DP-based privacy-preserving algorithm to ensure the data confidentiality of VFL participants. We mathematically prove that our algorithm not only provides utility guarantees for VFL, but also offers multi-level privacy, i.e. DP w.r.t. IR and joint differential privacy (JDP) w.r.t. model weights. Experimental results demonstrate that our work, under adequate privacy budgets, is quantitatively and qualitatively similar to GLMs, learned in idealized non-private VFL setting, rather than the increased cost in memory and processing time in most prior works based on HE or MPC. Our codes will be released if this paper is accepted.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

FLAME: Differentially Private Federated Learning in the Shuffle Model

137 - Ruixuan Liu , Yang Cao , Hong Chen 2020

Federated Learning (FL) is a promising machine learning paradigm that enables the analyzer to train a model without collecting users raw data. To ensure users privacy, differentially private federated learning has been intensively studied. The existi ng works are mainly based on the textit{curator model} or textit{local model} of differential privacy. However, both of them have pros and cons. The curator model allows greater accuracy but requires a trusted analyzer. In the local model where users randomize local data before sending them to the analyzer, a trusted analyzer is not required but the accuracy is limited. In this work, by leveraging the textit{privacy amplification} effect in the recently proposed shuffle model of differential privacy, we achieve the best of two worlds, i.e., accuracy in the curator model and strong privacy without relying on any trusted party. We first propose an FL framework in the shuffle model and a simple protocol (SS-Simple) extended from existing work. We find that SS-Simple only provides an insufficient privacy amplification effect in FL since the dimension of the model parameter is quite large. To solve this challenge, we propose an enhanced protocol (SS-Double) to increase the privacy amplification effect by subsampling. Furthermore, for boosting the utility when the model size is greater than the user population, we propose an advanced protocol (SS-Topk) with gradient sparsification techniques. We also provide theoretical analysis and numerical evaluations of the privacy amplification of the proposed protocols. Experiments on real-world dataset validate that SS-Topk improves the testing accuracy by 60.7% than the local model based FL.

التعلم الآلي التشفير والأمن التعلم الالي

Differentially-private Federated Neural Architecture Search

148 - Ishika Singh , Haoyi Zhou , Kunlin Yang 2020

Neural architecture search, which aims to automatically search for architectures (e.g., convolution, max pooling) of neural networks that maximize validation performance, has achieved remarkable progress recently. In many application scenarios, sever al parties would like to collaboratively search for a shared neural architecture by leveraging data from all parties. However, due to privacy concerns, no party wants its data to be seen by other parties. To address this problem, we propose federated neural architecture search (FNAS), where different parties collectively search for a differentiable architecture by exchanging gradients of architecture variables without exposing their data to other parties. To further preserve privacy, we study differentially-private FNAS (DP-FNAS), which adds random noise to the gradients of architecture variables. We provide theoretical guarantees of DP-FNAS in achieving differential privacy. Experiments show that DP-FNAS can search highly-performant neural architectures while protecting the privacy of individual parties. The code is available at https://github.com/UCSD-AI4H/DP-FNAS

التعلم الآلي التشفير والأمن التعلم الالي

Locally Differentially Private Federated Learning: Efficient Algorithms with Tight Risk Bounds

391 - Andrew Lowy , Meisam Razaviyayn 2021

Federated learning (FL) is a distributed learning paradigm in which many clients with heterogeneous, unbalanced, and often sensitive local data, collaborate to learn a model. Local Differential Privacy (LDP) provides a strong guarantee that each clie nts data cannot be leaked during and after training, without relying on a trusted third party. While LDP is often believed to be too stringent to allow for satisfactory utility, our paper challenges this belief. We consider a general setup with unbalanced, heterogeneous data, disparate privacy needs across clients, and unreliable communication, where a random number/subset of clients is available each round. We propose three LDP algorithms for smooth (strongly) convex FL; each are noisy variations of distributed minibatch SGD. One is accelerated and one involves novel time-varying noise, which we use to obtain the first non-trivial LDP excess risk bound for the fully general non-i.i.d. FL problem. Specializing to i.i.d. clients, our risk bounds interpolate between the best known and/or optimal bounds in the centralized setting and the cross-device setting, where each client represents just one persons data. Furthermore, we show that in certain regimes, our convergence rate (nearly) matches the corresponding non-private lower bound or outperforms state of the art non-private algorithms (``privacy for free). Finally, we validate our theoretical results and illustrate the practical utility of our algorithm with numerical experiments.

التعلم الآلي التشفير والأمن التحسين والتحكم