AirMixML: Over-the-Air Data Mixup for Inherently Privacy-Preserving Edge Machine Learning

136 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yusuke Koda

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yusuke Koda - Jihong Park - Mehdi Bennis

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Wireless channels can be inherently privacy-preserving by distorting the received signals due to channel noise, and superpositioning multiple signals over-the-air. By harnessing these natural distortions and superpositions by wireless channels, we propose a novel privacy-preserving machine learning (ML) framework at the network edge, coined over-the-air mixup ML (AirMixML). In AirMixML, multiple workers transmit analog-modulated signals of their private data samples to an edge server who trains an ML model using the received noisy-and superpositioned samples. AirMixML coincides with model training using mixup data augmentation achieving comparable accuracy to that with raw data samples. From a privacy perspective, AirMixML is a differentially private (DP) mechanism limiting the disclosure of each workers private sample information at the server, while the workers transmit power determines the privacy disclosure level. To this end, we develop a fractional channel-inversion power control (PC) method, {alpha}-Dirichlet mixup PC (DirMix({alpha})-PC), wherein for a given global power scaling factor after channel inversion, each workers local power contribution to the superpositioned signal is controlled by the Dirichlet dispersion ratio {alpha}. Mathematically, we derive a closed-form expression clarifying the relationship between the local and global PC factors to guarantee a target DP level. By simulations, we provide DirMix({alpha})-PC design guidelines to improve accuracy, privacy, and energy-efficiency. Finally, AirMixML with DirMix({alpha})-PC is shown to achieve reasonable accuracy compared to a privacy-violating baseline with neither superposition nor PC.

قيم البحث

143 - Runhua Xu , Nathalie Baracaldo , Yi Zhou 2021

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Privacy-Preserving Machine Learning: Methods, Challenges and Directions

258 - Runhua Xu , Nathalie Baracaldo , James Joshi 2021

Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model, especially, emerging deep neural network model, relies on a large volume of training data and high-powered computationa l resources. The need for a vast volume of available data raises serious privacy concerns because of the risk of leakage of highly privacy-sensitive information and the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data. Furthermore, a trained ML model may also be vulnerable to adversarial attacks such as membership/property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are crucial and have attracted increasing research interest from academia and industry. More and more efforts of PPML are proposed via integrating privacy-preserving techniques into ML algorithms, fusing privacy-preserving approaches into ML pipeline, or designing various privacy-preserving architectures for existing ML systems. In particular, existing PPML arts cross-cut ML, system, security, and privacy; hence, there is a critical need to understand state-of-art studies, related challenges, and a roadmap for future research. This paper systematically reviews and summarizes existing privacy-preserving approaches and proposes a PGU model to guide evaluation for various PPML solutions through elaborately decomposing their privacy-preserving functionalities. The PGU model is designed as the triad of Phase, Guarantee, and technical Utility. Furthermore, we also discuss the unique characteristics and challenges of PPML and outline possible directions of future work that benefit a wide range of research communities among ML, distributed systems, security, and privacy areas.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Federated Learning and Differential Privacy: Software tools analysis, the Sherpa.ai FL framework and methodological guidelines for preserving data privacy

159 - Nuria Rodriguez-Barroso , Goran Stipcich , Daniel Jimenez-Lopez 2020

The high demand of artificial intelligence services at the edges that also preserve data privacy has pushed the research on novel machine learning paradigms that fit those requirements. Federated learning has the ambition to protect data privacy thro ugh distributed learning methods that keep the data in their data silos. Likewise, differential privacy attains to improve the protection of data privacy by measuring the privacy loss in the communication among the elements of federated learning. The prospective matching of federated learning and differential privacy to the challenges of data privacy protection has caused the release of several software tools that support their functionalities, but they lack of the needed unified vision for those techniques, and a methodological workflow that support their use. Hence, we present the Sherpa.ai Federated Learning framework that is built upon an holistic view of federated learning and differential privacy. It results from the study of how to adapt the machine learning paradigm to federated learning, and the definition of methodological guidelines for developing artificial intelligence services based on federated learning and differential privacy. We show how to follow the methodological guidelines with the Sherpa.ai Federated Learning framework by means of a classification and a regression use cases.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Privacy Adversarial Network: Representation Learning for Mobile Data Privacy

270 - Sicong Liu , Junzhao Du , Anshumali Shrivastava 2020

The remarkable success of machine learning has fostered a growing number of cloud-based intelligent services for mobile users. Such a service requires a user to send data, e.g. image, voice and video, to the provider, which presents a serious challen ge to user privacy. To address this, prior works either obfuscate the data, e.g. add noise and remove identity information, or send representations extracted from the data, e.g. anonymized features. They struggle to balance between the service utility and data privacy because obfuscated data reduces utility and extracted representation may still reveal sensitive information. This work departs from prior works in methodology: we leverage adversarial learning to a better balance between privacy and utility. We design a textit{representation encoder} that generates the feature representations to optimize against the privacy disclosure risk of sensitive information (a measure of privacy) by the textit{privacy adversaries}, and concurrently optimize with the task inference accuracy (a measure of utility) by the textit{utility discriminator}. The result is the privacy adversarial network (systemname), a novel deep model with the new training algorithm, that can automatically learn representations from the raw data. Intuitively, PAN adversarially forces the extracted representations to only convey the information required by the target task. Surprisingly, this constitutes an implicit regularization that actually improves task accuracy. As a result, PAN achieves better utility and better privacy at the same time! We report extensive experiments on six popular datasets and demonstrate the superiority of systemname compared with alternative methods reported in prior work.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Privacy-preserving Machine Learning through Data Obfuscation

328 - Tianwei Zhang , Zecheng He , Ruby B. Lee 2018

As machine learning becomes a practice and commodity, numerous cloud-based services and frameworks are provided to help customers develop and deploy machine learning applications. While it is prevalent to outsource model training and serving tasks in the cloud, it is important to protect the privacy of sensitive samples in the training dataset and prevent information leakage to untrusted third parties. Past work have shown that a malicious machine learning service provider or end user can easily extract critical information about the training samples, from the model parameters or even just model outputs. In this paper, we propose a novel and generic methodology to preserve the privacy of training data in machine learning applications. Specifically we introduce an obfuscate function and apply it to the training data before feeding them to the model training task. This function adds random noise to existing samples, or augments the dataset with new samples. By doing so sensitive information about the properties of individual samples, or statistical properties of a group of samples, is hidden. Meanwhile the model trained from the obfuscated dataset can still achieve high accuracy. With this approach, the customers can safely disclose the data or models to third-party providers or end users without the need to worry about data privacy. Our experiments show that this approach can effective defeat four existing types of machine learning privacy attacks at negligible accuracy cost.

التشفير والأمن التعلم الآلي