Secure Bilevel Asynchronous Vertical Federated Learning with Backward Updating

129 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Qingsong Zhang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Qingsong Zhang - Bin Gu - Cheng Deng

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Vertical federated learning (VFL) attracts increasing attention due to the emerging demands of multi-party collaborative modeling and concerns of privacy leakage. In the real VFL applications, usually only one or partial parties hold labels, which makes it challenging for all parties to collaboratively learn the model without privacy leakage. Meanwhile, most existing VFL algorithms are trapped in the synchronous computations, which leads to inefficiency in their real-world applications. To address these challenging problems, we propose a novel {bf VF}L framework integrated with new {bf b}ackward updating mechanism and {bf b}ilevel asynchronous parallel architecture (VF{${textbf{B}}^2$}), under which three new algorithms, including VF{${textbf{B}}^2$}-SGD, -SVRG, and -SAGA, are proposed. We derive the theoretical results of the convergence rates of these three algorithms under both strongly convex and nonconvex conditions. We also prove the security of VF{${textbf{B}}^2$} under semi-honest threat models. Extensive experiments on benchmark datasets demonstrate that our algorithms are efficient, scalable and lossless.

قيم البحث

334 - Tianyi Chen , Xiao Jin , Yuejiao Sun 2020

Horizontal Federated learning (FL) handles multi-client data that share the same set of features, and vertical FL trains a better predictor that combine all the features from different clients. This paper targets solving vertical FL in an asynchronou s fashion, and develops a simple FL method. The new method allows each client to run stochastic gradient algorithms without coordination with other clients, so it is suitable for intermittent connectivity of clients. This method further uses a new technique of perturbed local embedding to ensure data privacy and improve communication efficiency. Theoretically, we present the convergence rate and privacy level of our method for strongly convex, nonconvex and even nonsmooth objectives separately. Empirically, we apply our method to FL on various image and healthcare datasets. The results compare favorably to centralized and synchronous FL methods.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التحسين والتحكم

Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

231 - Xin Yao , Tianchi Huang , Rui-Xiao Zhang 2019

Federated learning (FL) aims to train machine learning models in the decentralized system consisting of an enormous amount of smart edge devices. Federated averaging (FedAvg), the fundamental algorithm in FL settings, proposes on-device training and model aggregation to avoid the potential heavy communication costs and privacy concerns brought by transmitting raw data. However, through theoretical analysis we argue that 1) the multiple steps of local updating will result in gradient biases and 2) there is an inconsistency between the expected target distribution and the optimization objectives following the training paradigm in FedAvg. To tackle these problems, we first propose an unbiased gradient aggregation algorithm with the keep-trace gradient descent and the gradient evaluation strategy. Then we introduce an additional controllable meta updating procedure with a small set of data samples, indicating the expected target distribution, to provide a clear and consistent optimization objective. Both the two improvements are model- and task-agnostic and can be applied individually or together. Experimental results demonstrate that the proposed methods are faster in convergence and achieve higher accuracy with different network architectures in various FL settings.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التعلم الالي

Asynchronous Federated Learning for Sensor Data with Concept Drift

95 - Yujing Chen , Zheng Chai , Yue Cheng 2021

Federated learning (FL) involves multiple distributed devices jointly training a shared model without any of the participants having to reveal their local data to a centralized server. Most of previous FL approaches assume that data on devices are fi xed and stationary during the training process. However, this assumption is unrealistic because these devices usually have varying sampling rates and different system configurations. In addition, the underlying distribution of the device data can change dynamically over time, which is known as concept drift. Concept drift makes the learning process complicated because of the inconsistency between existing and upcoming data. Traditional concept drift handling techniques such as chunk based and ensemble learning-based methods are not suitable in the federated learning frameworks due to the heterogeneity of local devices. We propose a novel approach, FedConD, to detect and deal with the concept drift on local devices and minimize the effect on the performance of models in asynchronous FL. The drift detection strategy is based on an adaptive mechanism which uses the historical performance of the local models. The drift adaptation is realized by adjusting the regularization parameter of objective function on each local device. Additionally, we design a communication strategy on the server side to select local updates in a prudent fashion and speed up model convergence. Experimental evaluations on three evolving data streams and two image datasets show that model~detects and handles concept drift, and also reduces the overall communication cost compared to other baseline methods.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

FedSmart: An Auto Updating Federated Learning Optimization Mechanism

88 - Anxun He , Jianzong Wang , Zhangcheng Huang 2020

Federated learning has made an important contribution to data privacy-preserving. Many previous works are based on the assumption that the data are independently identically distributed (IID). As a result, the model performance on non-identically ind ependently distributed (non-IID) data is beyond expectation, which is the concrete situation. Some existing methods of ensuring the model robustness on non-IID data, like the data-sharing strategy or pretraining, may lead to privacy leaking. In addition, there exist some participants who try to poison the model with low-quality data. In this paper, a performance-based parameter return method for optimization is introduced, we term it FederatedSmart (FedSmart). It optimizes different model for each client through sharing global gradients, and it extracts the data from each client as a local validation set, and the accuracy that model achieves in round t determines the weights of the next round. The experiment results show that FedSmart enables the participants to allocate a greater weight to the ones with similar data distribution.

التعلم الآلي التشفير والأمن التعلم الالي

Vertical Federated Learning without Revealing Intersection Membership

117 - Jiankai Sun , Xin Yang , Yuanshun Yao 2021

Vertical Federated Learning (vFL) allows multiple parties that own different attributes (e.g. features and labels) of the same data entity (e.g. a person) to jointly train a model. To prepare the training data, vFL needs to identify the common data e ntities shared by all parties. It is usually achieved by Private Set Intersection (PSI) which identifies the intersection of training samples from all parties by using personal identifiable information (e.g. email) as sample IDs to align data instances. As a result, PSI would make sample IDs of the intersection visible to all parties, and therefore each party can know that the data entities shown in the intersection also appear in the other parties, i.e. intersection membership. However, in many real-world privacy-sensitive organizations, e.g. banks and hospitals, revealing membership of their data entities is prohibited. In this paper, we propose a vFL framework based on Private Set Union (PSU) that allows each party to keep sensitive membership information to itself. Instead of identifying the intersection of all training samples, our PSU protocol generates the union of samples as training instances. In addition, we propose strategies to generate synthetic features and labels to handle samples that belong to the union but not the intersection. Through extensive experiments on two real-world datasets, we show our framework can protect the privacy of the intersection membership while maintaining the model utility.

التعلم الآلي الذكاء الاصطناعي