Federated Learning and Differential Privacy: Software tools analysis, the Sherpa.ai FL framework and methodological guidelines for preserving data privacy

160 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Eugenio Mart\\'inez-C\\'amara

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Nuria Rodriguez-Barroso - Goran Stipcich - Daniel Jimenez-Lopez

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The high demand of artificial intelligence services at the edges that also preserve data privacy has pushed the research on novel machine learning paradigms that fit those requirements. Federated learning has the ambition to protect data privacy through distributed learning methods that keep the data in their data silos. Likewise, differential privacy attains to improve the protection of data privacy by measuring the privacy loss in the communication among the elements of federated learning. The prospective matching of federated learning and differential privacy to the challenges of data privacy protection has caused the release of several software tools that support their functionalities, but they lack of the needed unified vision for those techniques, and a methodological workflow that support their use. Hence, we present the Sherpa.ai Federated Learning framework that is built upon an holistic view of federated learning and differential privacy. It results from the study of how to adapt the machine learning paradigm to federated learning, and the definition of methodological guidelines for developing artificial intelligence services based on federated learning and differential privacy. We show how to follow the methodological guidelines with the Sherpa.ai Federated Learning framework by means of a classification and a regression use cases.

قيم البحث

143 - Runhua Xu , Nathalie Baracaldo , Yi Zhou 2021

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Privacy Threats Analysis to Secure Federated Learning

186 - Yuchen Li , Yifan Bao , Liyao Xiang 2021

Federated learning is emerging as a machine learning technique that trains a model across multiple decentralized parties. It is renowned for preserving privacy as the data never leaves the computational devices, and recent approaches further enhance its privacy by hiding messages transferred in encryption. However, we found that despite the efforts, federated learning remains privacy-threatening, due to its interactive nature across different parties. In this paper, we analyze the privacy threats in industrial-level federated learning frameworks with secure computation, and reveal such threats widely exist in typical machine learning models such as linear regression, logistic regression and decision tree. For the linear and logistic regression, we show through theoretical analysis that it is possible for the attacker to invert the entire private input of the victim, given very few information. For the decision tree model, we launch an attack to infer the range of victims private inputs. All attacks are evaluated on popular federated learning frameworks and real-world datasets.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Federated $f$-Differential Privacy

97 - Qinqing Zheng , Shuxiao Chen , Qi Long 2021

Federated learning (FL) is a training paradigm where the clients collaboratively learn models by repeatedly sharing information without compromising much on the privacy of their local sensitive data. In this paper, we introduce federated $f$-differen tial privacy, a new notion specifically tailored to the federated setting, based on the framework of Gaussian differential privacy. Federated $f$-differential privacy operates on record level: it provides the privacy guarantee on each individual record of one clients data against adversaries. We then propose a generic private federated learning framework {PriFedSync} that accommodates a large family of state-of-the-art FL algorithms, which provably achieves federated $f$-differential privacy. Finally, we empirically demonstrate the trade-off between privacy guarantee and prediction performance for models trained by {PriFedSync} in computer vision tasks.

التعلم الالي الذكاء الاصطناعي التشفير والأمن

Privacy-Preserving Machine Learning: Methods, Challenges and Directions

258 - Runhua Xu , Nathalie Baracaldo , James Joshi 2021

Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model, especially, emerging deep neural network model, relies on a large volume of training data and high-powered computationa l resources. The need for a vast volume of available data raises serious privacy concerns because of the risk of leakage of highly privacy-sensitive information and the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data. Furthermore, a trained ML model may also be vulnerable to adversarial attacks such as membership/property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are crucial and have attracted increasing research interest from academia and industry. More and more efforts of PPML are proposed via integrating privacy-preserving techniques into ML algorithms, fusing privacy-preserving approaches into ML pipeline, or designing various privacy-preserving architectures for existing ML systems. In particular, existing PPML arts cross-cut ML, system, security, and privacy; hence, there is a critical need to understand state-of-art studies, related challenges, and a roadmap for future research. This paper systematically reviews and summarizes existing privacy-preserving approaches and proposes a PGU model to guide evaluation for various PPML solutions through elaborately decomposing their privacy-preserving functionalities. The PGU model is designed as the triad of Phase, Guarantee, and technical Utility. Furthermore, we also discuss the unique characteristics and challenges of PPML and outline possible directions of future work that benefit a wide range of research communities among ML, distributed systems, security, and privacy areas.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Privacy Adversarial Network: Representation Learning for Mobile Data Privacy

270 - Sicong Liu , Junzhao Du , Anshumali Shrivastava 2020

The remarkable success of machine learning has fostered a growing number of cloud-based intelligent services for mobile users. Such a service requires a user to send data, e.g. image, voice and video, to the provider, which presents a serious challen ge to user privacy. To address this, prior works either obfuscate the data, e.g. add noise and remove identity information, or send representations extracted from the data, e.g. anonymized features. They struggle to balance between the service utility and data privacy because obfuscated data reduces utility and extracted representation may still reveal sensitive information. This work departs from prior works in methodology: we leverage adversarial learning to a better balance between privacy and utility. We design a textit{representation encoder} that generates the feature representations to optimize against the privacy disclosure risk of sensitive information (a measure of privacy) by the textit{privacy adversaries}, and concurrently optimize with the task inference accuracy (a measure of utility) by the textit{utility discriminator}. The result is the privacy adversarial network (systemname), a novel deep model with the new training algorithm, that can automatically learn representations from the raw data. Intuitively, PAN adversarially forces the extracted representations to only convey the information required by the target task. Surprisingly, this constitutes an implicit regularization that actually improves task accuracy. As a result, PAN achieves better utility and better privacy at the same time! We report extensive experiments on six popular datasets and demonstrate the superiority of systemname compared with alternative methods reported in prior work.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن