ترغب بنشر مسار تعليمي؟ اضغط هنا

Privacy Preserving Vertical Federated Learning for Tree-based Models

126   0   0.0 ( 0 )
 نشر من قبل Yuncheng Wu
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies {it vertical} federated learning, which tackles the scenarios where (i) collaborating organizations own data of the same set of users but with disjoint features, and (ii) only one organization holds the labels. We propose Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output). Pivot does not rely on any trusted third party and provides protection against a semi-honest adversary that may compromise $m-1$ out of $m$ clients. We further identify two privacy leakages when the trained decision tree model is released in plaintext and propose an enhanced protocol to mitigate them. The proposed solution can also be extended to tree ensemble models, e.g., random forest (RF) and gradient boosting decision tree (GBDT) by treating single decision trees as building blocks. Theoretical and experimental analysis suggest that Pivot is efficient for the privacy achieved.

قيم البحث

اقرأ أيضاً

Tree-based models are among the most efficient machine learning techniques for data mining nowadays due to their accuracy, interpretability, and simplicity. The recent orthogonal needs for more data and privacy protection call for collaborative priva cy-preserving solutions. In this work, we survey the literature on distributed and privacy-preserving training of tree-based models and we systematize its knowledge based on four axes: the learning algorithm, the collaborative model, the protection mechanism, and the threat model. We use this to identify the strengths and limitations of these works and provide for the first time a framework analyzing the information leakage occurring in distributed tree-based model learning.
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to $N-1$ parties. To efficiently execute the secure backpropagation algorithm for training neural networks, we provide a generic packing approach that enables Single Instruction, Multiple Data (SIMD) operations on encrypted data. We also introduce arbitrary linear transformations within the cryptographic bootstrapping operation, optimizing the costly cryptographic computations over the parties, and we define a constrained optimization problem for choosing the cryptographic parameters. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3-layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours.
Federated learning has emerged as a promising approach for collaborative and privacy-preserving learning. Participants in a federated learning process cooperatively train a model by exchanging model parameters instead of the actual training data, whi ch they might want to keep private. However, parameter interaction and the resulting model still might disclose information about the training data used. To address these privacy concerns, several approaches have been proposed based on differential privacy and secure multiparty computation (SMC), among others. They often result in large communication overhead and slow training time. In this paper, we propose HybridAlpha, an approach for privacy-preserving federated learning employing an SMC protocol based on functional encryption. This protocol is simple, efficient and resilient to participants dropping out. We evaluate our approach regarding the training time and data volume exchanged using a federated learning process to train a CNN on the MNIST data set. Evaluation against existing crypto-based SMC solutions shows that HybridAlpha can reduce the training time by 68% and data transfer volume by 92% on average while providing the same model performance and privacy guarantees as the existing solutions.
Mobile crowdsensing (MCS) is an emerging sensing data collection pattern with scalability, low deployment cost, and distributed characteristics. Traditional MCS systems suffer from privacy concerns and fair reward distribution. Moreover, existing pri vacy-preserving MCS solutions usually focus on the privacy protection of data collection rather than that of data processing. To tackle faced problems of MCS, in this paper, we integrate federated learning (FL) into MCS and propose a privacy-preserving MCS system, called textsc{CrowdFL}. Specifically, in order to protect privacy, participants locally process sensing data via federated learning and only upload encrypted training models. Particularly, a privacy-preserving federated averaging algorithm is proposed to average encrypted training models. To reduce computation and communication overhead of restraining dropped participants, discard and retransmission strategies are designed. Besides, a privacy-preserving posted pricing incentive mechanism is designed, which tries to break the dilemma of privacy protection and data evaluation. Theoretical analysis and experimental evaluation on a practical MCS application demonstrate the proposed textsc{CrowdFL} can effectively protect participants privacy and is feasible and efficient.
Federated learning enables mutually distrusting participants to collaboratively learn a distributed machine learning model without revealing anything but the models output. Generic federated learning has been studied extensively, and several learning protocols, as well as open-source frameworks, have been developed. Yet, their over pursuit of computing efficiency and fast implementation might diminish the security and privacy guarantees of participants training data, about which little is known thus far. In this paper, we consider an honest-but-curious adversary who participants in training a distributed ML model, does not deviate from the defined learning protocol, but attempts to infer private training data from the legitimately received information. In this setting, we design and implement two practical attacks, reverse sum attack and reverse multiplication attack, neither of which will affect the accuracy of the learned model. By empirically studying the privacy leakage of two learning protocols, we show that our attacks are (1) effective - the adversary successfully steal the private training data, even when the intermediate outputs are encrypted to protect data privacy; (2) evasive - the adversarys malicious behavior does not deviate from the protocol specification and deteriorate any accuracy of the target model; and (3) easy - the adversary needs little prior knowledge about the data distribution of the target participant. We also experimentally show that the leaked information is as effective as the raw training data through training an alternative classifier on the leaked information. We further discuss potential countermeasures and their challenges, which we hope may lead to several promising research directions.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا