ترغب بنشر مسار تعليمي؟ اضغط هنا

Scalable Privacy-Preserving Distributed Learning

42   0   0.0 ( 0 )
 نشر من قبل David Froelicher
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the execution of a cooperative gradient-descent and the evaluation of the obtained model and by preserving data and model confidentiality in a passive-adversary model with up to N-1 colluding parties. SPINDLE uses multiparty homomorphic encryption to execute parallel high-depth computations on encrypted data without significant overhead. We instantiate SPINDLE for the training and evaluation of generalized linear models on distributed datasets and show that it is able to accurately (on par with non-secure centrally-trained models) and efficiently (due to a multi-level parallelization of the computations) train models that require a high number of iterations on large input data with thousands of features, distributed among hundreds of data providers. For instance, it trains a logistic-regression model on a dataset of one million samples with 32 features distributed among 160 data providers in less than three minutes.



قيم البحث

اقرأ أيضاً

Recent attacks on federated learning demonstrate that keeping the training data on clients devices does not provide sufficient privacy, as the model parameters shared by clients can leak information about their training data. A secure aggregation pro tocol enables the server to aggregate clients models in a privacy-preserving manner. However, existing secure aggregation protocols incur high computation/communication costs, especially when the number of model parameters is larger than the number of clients participating in an iteration -- a typical scenario in federated learning. In this paper, we propose a secure aggregation protocol, FastSecAgg, that is efficient in terms of computation and communication, and robust to client dropouts. The main building block of FastSecAgg is a novel multi-secret sharing scheme, FastShare, based on the Fast Fourier Transform (FFT), which may be of independent interest. FastShare is information-theoretically secure, and achieves a trade-off between the number of secrets, privacy threshold, and dropout tolerance. Riding on the capabilities of FastShare, we prove that FastSecAgg is (i) secure against the server colluding with any subset of some constant fraction (e.g. $sim10%$) of the clients in the honest-but-curious setting; and (ii) tolerates dropouts of a random subset of some constant fraction (e.g. $sim10%$) of the clients. FastSecAgg achieves significantly smaller computation cost than existing schemes while achieving the same (orderwise) communication cost. In addition, it guarantees security against adaptive adversaries, which can perform client corruptions dynamically during the execution of the protocol.
In this paper, we study the problem of summation evaluation of secrets. The secrets are distributed over a network of nodes that form a ring graph. Privacy-preserving iterative protocols for computing the sum of the secrets are proposed, which are co mpatible with node join and leave situations. Theoretic bounds are derived regarding the utility and accuracy, and the proposed protocols are shown to comply with differential privacy requirements. Based on utility, accuracy and privacy, we also provide guidance on appropriate selections of random noise parameters. Additionally, a few numerical examples that demonstrate their effectiveness are provided.
As machine learning becomes a practice and commodity, numerous cloud-based services and frameworks are provided to help customers develop and deploy machine learning applications. While it is prevalent to outsource model training and serving tasks in the cloud, it is important to protect the privacy of sensitive samples in the training dataset and prevent information leakage to untrusted third parties. Past work have shown that a malicious machine learning service provider or end user can easily extract critical information about the training samples, from the model parameters or even just model outputs. In this paper, we propose a novel and generic methodology to preserve the privacy of training data in machine learning applications. Specifically we introduce an obfuscate function and apply it to the training data before feeding them to the model training task. This function adds random noise to existing samples, or augments the dataset with new samples. By doing so sensitive information about the properties of individual samples, or statistical properties of a group of samples, is hidden. Meanwhile the model trained from the obfuscated dataset can still achieve high accuracy. With this approach, the customers can safely disclose the data or models to third-party providers or end users without the need to worry about data privacy. Our experiments show that this approach can effective defeat four existing types of machine learning privacy attacks at negligible accuracy cost.
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to $N-1$ parties. To efficiently execute the secure backpropagation algorithm for training neural networks, we provide a generic packing approach that enables Single Instruction, Multiple Data (SIMD) operations on encrypted data. We also introduce arbitrary linear transformations within the cryptographic bootstrapping operation, optimizing the costly cryptographic computations over the parties, and we define a constrained optimization problem for choosing the cryptographic parameters. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3-layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours.
E-voting systems are a powerful technology for improving democracy. Unfortunately, prior voting systems have single points-of-failure, which may compromise availability, privacy, or integrity of the election results. We present the design, implemen tation, security analysis, and evaluation of the D-DEMOS suite of distributed, privacy-preserving, and end-to-end verifiable e-voting systems. We present two systems: one asynchronous and one with minimal timing assumptions but better performance. Our systems include a distributed vote collection subsystem that does not require cryptographic operations on behalf of the voter. We also include a distributed, replicated and fault-tolerant Bulletin Board component, that stores all necessary election-related information, and allows any party to read and verify the complete election process. Finally, we incorporate trustees, who control result production while guaranteeing privacy and end-to-end-verifiability as long as their strong majority is honest. Our suite of e-voting systems are the first whose voting operation is human verifiable, i.e., a voter can vote over the web, even when her web client stack is potentially unsafe, without sacrificing her privacy, and still be assured her vote was recorded as cast. Additionally, a voter can outsource election auditing to third parties, still without sacrificing privacy. We provide a model and security analysis of the systems, implement complete prototypes, measure their performance experimentally, and demonstrate their ability to handle large-scale elections. Finally, we demonstrate the performance trade-offs between the t
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا