Source Inference Attacks in Federated Learning

126 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Hongsheng Hu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Hongsheng Hu - Zoran Salcic - Lichao Sun

التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Federated learning (FL) has emerged as a promising privacy-aware paradigm that allows multiple clients to jointly train a model without sharing their private data. Recently, many studies have shown that FL is vulnerable to membership inference attacks (MIAs) that can distinguish the training members of the given model from the non-members. However, existing MIAs ignore the source of a training member, i.e., the information of which client owns the training member, while it is essential to explore source privacy in FL beyond membership privacy of examples from all clients. The leakage of source information can lead to severe privacy issues. For example, identification of the hospital contributing to the training of an FL model for COVID-19 pandemic can render the owner of a data record from this hospital more prone to discrimination if the hospital is in a high risk region. In this paper, we propose a new inference attack called source inference attack (SIA), which can derive an optimal estimation of the source of a training member. Specifically, we innovatively adopt the Bayesian perspective to demonstrate that an honest-but-curious server can launch an SIA to steal non-trivial source information of the training members without violating the FL protocol. The server leverages the prediction loss of local models on the training members to achieve the attack effectively and non-intrusively. We conduct extensive experiments on one synthetic and five real datasets to evaluate the key factors in an SIA, and the results show the efficacy of the proposed source inference attack.

قيم البحث

183 - Yupeng Jiang , Yong Li , Yipeng Zhou 2020

In federated learning, machine learning and deep learning models are trained globally on distributed devices. The state-of-the-art privacy-preserving technique in the context of federated learning is user-level differential privacy. However, such a m echanism is vulnerable to some specific model poisoning attacks such as Sybil attacks. A malicious adversary could create multiple fake clients or collude compromised devices in Sybil attacks to mount direct model updates manipulation. Recent works on novel defense against model poisoning attacks are difficult to detect Sybil attacks when differential privacy is utilized, as it masks clients model updates with perturbation. In this work, we implement the first Sybil attacks on differential privacy based federated learning architectures and show their impacts on model convergence. We randomly compromise some clients by manipulating different noise levels reflected by the local privacy budget epsilon of differential privacy on the local model updates of these Sybil clients such that the global model convergence rates decrease or even leads to divergence. We apply our attacks to two recent aggregation defense mechanisms, called Krum and Trimmed Mean. Our evaluation results on the MNIST and CIFAR-10 datasets show that our attacks effectively slow down the convergence of the global models. We then propose a method to keep monitoring the average loss of all participants in each round for convergence anomaly detection and defend our Sybil attacks based on the prediction cost reported from each client. Our empirical study demonstrates that our defense approach effectively mitigates the impact of our Sybil attacks on model convergence.

التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية

WAFFLE: Watermarking in Federated Learning

87 - Buse Gul Atli , Yuxi Xia , Samuel Marchal 2020

Federated learning is a distributed learning technique where machine learning models are trained on client devices in which the local training data resides. The training is coordinated via a central server which is, typically, controlled by the inten ded owner of the resulting model. By avoiding the need to transport the training data to the central server, federated learning improves privacy and efficiency. But it raises the risk of model theft by clients because the resulting model is available on every client device. Even if the application software used for local training may attempt to prevent direct access to the model, a malicious client may bypass any such restrictions by reverse engineering the application software. Watermarking is a well-known deterrence method against model theft by providing the means for model owners to demonstrate ownership of their models. Several recent deep neural network (DNN) watermarking techniques use backdooring: training the models with additional mislabeled data. Backdooring requires full access to the training data and control of the training process. This is feasible when a single party trains the model in a centralized manner, but not in a federated learning setting where the training process and training data are distributed among several client devices. In this paper, we present WAFFLE, the first approach to watermark DNN models trained using federated learning. It introduces a retraining step at the server after each aggregation of local models into the global model. We show that WAFFLE efficiently embeds a resilient watermark into models incurring only negligible degradation in test accuracy (-0.17%), and does not require access to training data. We also introduce a novel technique to generate the backdoor used as a watermark. It outperforms prior techniques, imposing no communication, and low computational (+3.2%) overhead.

التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Byzantine-Resilient Secure Federated Learning

96 - Jinhyun So , Basak Guler , A. Salman Avestimehr 2020

Secure federated learning is a privacy-preserving framework to improve machine learning models by training over large volumes of data collected by mobile users. This is achieved through an iterative process where, at each iteration, users update a gl obal model using their local datasets. Each user then masks its local model via random keys, and the masked models are aggregated at a central server to compute the global model for the next iteration. As the local models are protected by random masks, the server cannot observe their true values. This presents a major challenge for the resilience of the model against adversarial (Byzantine) users, who can manipulate the global model by modifying their local models or datasets. Towards addressing this challenge, this paper presents the first single-server Byzantine-resilient secure aggregation framework (BREA) for secure federated learning. BREA is based on an integrated stochastic quantization, verifiable outlier detection, and secure model aggregation approach to guarantee Byzantine-resilience, privacy, and convergence simultaneously. We provide theoretical convergence and privacy guarantees and characterize the fundamental trade-offs in terms of the network size, user dropouts, and privacy protection. Our experiments demonstrate convergence in the presence of Byzantine users, and comparable accuracy to conventional federated learning benchmarks.

التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection

382 - Shulai Zhang , Zirui Li , Quan Chen 2021

Federated learning (FL) is a distributed machine learning paradigm that allows clients to collaboratively train a model over their own local data. FL promises the privacy of clients and its security can be strengthened by cryptographic methods such a s additively homomorphic encryption (HE). However, the efficiency of FL could seriously suffer from the statistical heterogeneity in both the data distribution discrepancy among clients and the global distribution skewness. We mathematically demonstrate the cause of performance degradation in FL and examine the performance of FL over various datasets. To tackle the statistical heterogeneity problem, we propose a pluggable system-level client selection method named Dubhe, which allows clients to proactively participate in training, meanwhile preserving their privacy with the assistance of HE. Experimental results show that Dubhe is comparable with the optimal greedy method on the classification accuracy, with negligible encryption and communication overhead.

التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

FLASHE: Additively Symmetric Homomorphic Encryption for Cross-Silo Federated Learning

168 - Zhifeng Jiang , Wei Wang , Yang Liu 2021

Homomorphic encryption (HE) is a promising privacy-preserving technique for cross-silo federated learning (FL), where organizations perform collaborative model training on decentralized data. Despite the strong privacy guarantee, general HE schemes r esult in significant computation and communication overhead. Prior works employ batch encryption to address this problem, but it is still suboptimal in mitigating communication overhead and is incompatible with sparsification techniques. In this paper, we propose FLASHE, an HE scheme tailored for cross-silo FL. To capture the minimum requirements of security and functionality, FLASHE drops the asymmetric-key design and only involves modular addition operations with random numbers. Depending on whether to accommodate sparsification techniques, FLASHE is optimized in computation efficiency with different approaches. We have implemented FLASHE as a pluggable module atop FATE, an industrial platform for cross-silo FL. Compared to plaintext training, FLASHE slightly increases the training time by $leq6%$, with no communication overhead.

التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي