No Arabic abstract
Federated learning (FL) emerged as a promising learning paradigm to enable a multitude of participants to construct a joint ML model without exposing their private training data. Existing FL designs have been shown to exhibit vulnerabilities which can be exploited by adversaries both within and outside of the system to compromise data privacy. However, most current works conduct attacks by leveraging gradients on a small batch of data, which is less practical in FL. In this work, we consider a more practical and interesting scenario in which participants share their epoch-averaged gradients (share gradients after at least 1 epoch of local training) rather than per-example or small batch-averaged gradients as in previous works. We perform the first systematic evaluation of attribute reconstruction attack (ARA) launched by the malicious server in the FL system, and empirically demonstrate that the shared epoch-averaged local model gradients can reveal sensitive attributes of local training data of any victim participant. To achieve this goal, we develop a more effective and efficient gradient matching based method called cos-matching to reconstruct the training data attributes. We evaluate our attacks on a variety of real-world datasets, scenarios, assumptions. Our experiments show that our proposed method achieves better attribute attack performance than most existing baselines.
Recently researchers have studied input leakage problems in Federated Learning (FL) where a malicious party can reconstruct sensitive training inputs provided by users from shared gradient. It raises concerns about FL since input leakage contradicts the privacy-preserving intention of using FL. Despite a relatively rich literature on attacks and defenses of input reconstruction in Horizontal FL, input leakage and protection in vertical FL starts to draw researchers attention recently. In this paper, we study how to defend against input leakage attacks in Vertical FL. We design an adversarial training-based framework that contains three modules: adversarial reconstruction, noise regularization, and distance correlation minimization. Those modules can not only be employed individually but also applied together since they are independent to each other. Through extensive experiments on a large-scale industrial online advertising dataset, we show our framework is effective in protecting input privacy while retaining the model utility.
Technology advances in areas such as sensors, IoT, and robotics, enable new collaborative applications (e.g., autonomous devices). A primary requirement for such collaborations is to have a secure system which enables information sharing and information flow protection. Policy-based management system is a key mechanism for secure selective sharing of protected resources. However, policies in each party of such a collaborative environment cannot be static as they have to adapt to different contexts and situations. One advantage of collaborative applications is that each party in the collaboration can take advantage of knowledge of the other parties for learning or enhancing its own policies. We refer to this learning mechanism as policy transfer. The design of a policy transfer framework has challenges, including policy conflicts and privacy issues. Policy conflicts typically arise because of differences in the obligations of the parties, whereas privacy issues result because of data sharing constraints for sensitive data. Hence, the policy transfer framework should be able to tackle such challenges by considering minimal sharing of data and support policy adaptation to address conflict. In the paper we propose a framework that aims at addressing such challenges. We introduce a formal definition of the policy transfer problem for attribute-based policies. We then introduce the transfer methodology that consists of three sequential steps. Finally we report experimental results.
A learning federation is composed of multiple participants who use the federated learning technique to collaboratively train a machine learning model without directly revealing the local data. Nevertheless, the existing federated learning frameworks have a serious defect that even a participant is revoked, its data are still remembered by the trained model. In a company-level cooperation, allowing the remaining companies to use a trained model that contains the memories from a revoked company is obviously unacceptable, because it can lead to a big conflict of interest. Therefore, we emphatically discuss the participant revocation problem of federated learning and design a revocable federated random forest (RF) framework, RevFRF, to further illustrate the concept of revocable federated learning. In RevFRF, we first define the security problems to be resolved by a revocable federated RF. Then, a suite of homomorphic encryption based secure protocols are designed for federated RF construction, prediction and revocation. Through theoretical analysis and experiments, we show that the protocols can securely and efficiently implement collaborative training of an RF and ensure that the memories of a revoked participant in the trained RF are securely removed.
Federated learning has become prevalent in medical diagnosis due to its effectiveness in training a federated model among multiple health institutions (i.e. Data Islands (DIs)). However, increasingly massive DI-level poisoning attacks have shed light on a vulnerability in federated learning, which inject poisoned data into certain DIs to corrupt the availability of the federated model. Previous works on federated learning have been inadequate in ensuring the privacy of DIs and the availability of the final federated model. In this paper, we design a secure federated learning mechanism with multiple keys to prevent DI-level poisoning attacks for medical diagnosis, called SFPA. Concretely, SFPA provides privacy-preserving random forest-based federated learning by using the multi-key secure computation, which guarantees the confidentiality of DI-related information. Meanwhile, a secure defense strategy over encrypted locally-submitted models is proposed to defense DI-level poisoning attacks. Finally, our formal security analysis and empirical tests on a public cloud platform demonstrate the security and efficiency of SFPA as well as its capability of resisting DI-level poisoning attacks.
In the federated learning system, parameter gradients are shared among participants and the central modulator, while the original data never leave their protected source domain. However, the gradient itself might carry enough information for precise inference of the original data. By reporting their parameter gradients to the central server, client datasets are exposed to inference attacks from adversaries. In this paper, we propose a quantitative metric based on mutual information for clients to evaluate the potential risk of information leakage in their gradients. Mutual information has received increasing attention in the machine learning and data mining community over the past few years. However, existing mutual information estimation methods cannot handle high-dimensional variables. In this paper, we propose a novel method to approximate the mutual information between the high-dimensional gradients and batched input data. Experimental results show that the proposed metric reliably reflect the extent of information leakage in federated learning. In addition, using the proposed metric, we investigate the influential factors of risk level. It is proven that, the risk of information leakage is related to the status of the task model, as well as the inherent data distribution.